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Background 



Silverbrook's bilithic Memjer ^ printheads are the target printheads for printing systems 
^vhich will be controlled by SoPEC and MoPEC devices. 

This document presents the format and structure of these printheads, and describes the 
their possible arrangements in the target systems. It also defines a set of tenns used to dif- 
ferentiate between the types of printheads and the systems which use them. 



Currently, this document is only concerned with the structure of the printheads and their 
systems, with regard to the way in which dot data is loaded. 

Refer to the Bilithic Printhead Specification [1] for the complete description of the func- 
tionality of these devices. 

This document relies on certain definitions and details presented in Bilithic Printhead 
Specification [1]. 



It is intended that this document be used as a reference for engineers involved in the 
design work on the SoPEC and MoPEC projects. 



1,1 



Companion Documents 



1.2 



Readership 
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2 Definitions 

This document presents terminology and definitions used to describe the bilithic printhead 
systems. These terms and definitions are as follows: x 

• Printhead Type - There are 3 parameters which define the type of printhead used in a 
system: 

• Direction of the data flow through the printhead (clockwise or anti-clockwise, with 
the printhead shooting ink down onto the page). 

• Location of the left-most dot (upper row or lower row, with respect to ). 

• Printhead footprint (type A or type B, characterized by the data pin being on the left 
or the right of where is at the top of the printhead). 

• Printhead Arrangement - Even though there are 8 printhead types, each arrangement 

has to use a specific pairing of printheads, as discussed in Section 3. This gives 4 
pairs of printheads. However, because the paper can flow in either direction with 
respect to the printheads, there are a total of eight possible arrangements, e.g. 
Arrangement I has a Type 0 printhead on the left with respect to the paper flow, and 
a lype I printhead on the right. Arrangement 2 uses the same prindiead pair as 
Arrangement 1, but the paper flows in the opposite direction. 

• Color 0 is always the first color plane encountered by the paper. 

• DotO is defined as the nozzle which can print a dot in the left-most side of the page. 

• The Even Plane of a color corresponds to the row of nozzles that prints dot 0. 

Note that throughout this document, where the various printheads and systems are pre- 
sented, the printheads always shoot ink down onto the page. 

Figure 1 shows the 8 different possible prindiead types. Type 0 is identical to die Right 
Printhead presented in Figure 3 iii [1], and TVpe 1 is the same as the Left Printhead as 
defined in [1]. 
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miile the primheads shown in Figure 1 look to be of equal width (having the same number 
of nozzlesj it is in^ortant to remember that in a typical system, a pair qf unequal sized 
printheads may be used 
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Figure 1. Printhead Types 0 to 7 

Table 1 defines the printhead pairing and location of the each printhead type, with respect 
to the flow of paper, for the 8 possible arrangements. 

Table 1. DefiniOon of the different printhead arrangements 
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3 Bilithic Printhead Systems 

When using the bilithic printheads, the position of the power/gnd bars coupled with the 
physical footprint of the printheads mean that we must use a specific pairing of printheads 
together for printing on the same side of an A4 (or wider) page, e.g. we must always use a 
Type 0 printhead with a Type 1 printhead etc. 

While a given printing system can use any one of the eight possible arrangements of print- 
heads, this document only presents two of them. Arrangement 1 and Arrangement 2, for 
purposes of illustration. These two arrangements are discussed in subsequent sections of 
this document. However, the other 6 possibilities also need to be considered. 

The main difference between the two printhead arrangements discussed in this document 
is the direction of the paper flow. Because of this, the dot data has to be loaded differently 
in Arrangement 1 compared to Arrangement 2, in order to render the page coirectly. 
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3.1 Example 1 : Printhead Arrangement 1 

Figure 2 shows an Arrangement 1 printing setup, where the bilithic printheads are 
arranged as follows: 

• The Type 0 printhead is on the left with respect to the direction of the paper flow. 

• The Type 1 printhead is on the right. 



Type 0 Printhead 



Type 1 Printhead 




Gnd 
▲ i 



The printheads are facing downwards. 
The iiJc is being shot down onto the pace Direction 

of Paper Flow 



Figure 2. Identification of printheads nozzles and shift-register sequences for 
printheads in Arrangement 1 
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Table 2 lists the order in which the dot data needs to be loaded into the above printhead 
system, to ensure color 0-dot 0 appears on the left side of the printed page. 

Table 2. Order in which the even and odd dots are loaded for printhead Arrangement 
1 









Odd 


Loaded second in 

descending order. 


Loaded first in 
descending order. 


Even 


Loaded first in 
ascending order. 


Loaded second in 
ascending order 



Figure 3 shows how the dot data is demultiplexed within the printheads. 
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Figure 3. Demultiplexing of data within the printheads In Arrangement 1 

Figure 4 and Figure 5 show the way in which the dot data needs to be loaded into the print- 
heads in Arrangement 1, to ensure that color 0-dot 0 appears on the left side of the printed 
page. 
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Figure 4. Signalling for a Type 0 printhead in Arrangement 1 
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Figure 5. Signalling for a Type 1 printhead In Arrangement 1 
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3.2 Example 2: Printhead Arrangement 2 

Figure 6 shows an Arrangement 2 printing setup, where the bilithic printheads are 
arranged as follows: 

• The Type I printhead is on the left with respect to the direction of the paper flow. 

• The Type 0 printhead is on the right. 



The printheads are lacing downwards. 
The ink is being shot down onto the page. 



Type 0 Printhead 



t ? 

Direction 
of Paper Flow 



lype 1 Printhead 




Gnd 



Figure 6. Identification of printheads nozzles and shfft-reglster sequences for 
printheads in Arrangement 2 
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Table 3 lists the order in which the dot data needs to be loaded into the above printhead 
system, to ensure color 0-dot 0 appears on the left side of the printed page. 



Table 3. Order in which the even and odd dots are loaded for printhead Arrangement 









Odd 


Loaded first {n 
descending order. 


Loaded second in 

descending order 
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Loaded second in 
ascending order 


Loaded first in 
ascending order. 



Figure 7 shows how the dot data is demultiplexed within the printheads. 
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Figure 7. OemulUpfexIng of data within the printheads in Arrangement 2 

Figure 8 and Figure 9 show the way in which the dot data needs to be loaded into the print- 
heads in Arrangement 2, to ensure that color 0-dot 0 appears on the left side of the printed 
page. 

Figure 8. Signalling for a Type 0 printhead in Arrangement 2 
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Figure 9. Signalling for a lype 1 printhead in Arrangement 2 
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3.3 



CONCLUSJONS 



Comparing the signalling diagrams for Arrangement 1 with those shown for Arrangement 
2, it can be seen that the color/dot sequence output for a printhead type in Arrangement 1 
is the reverse of the sequence for same printhead in Arrangement 2 in terms of the order in 
which the color plane data is output, as well as whether even or odd data is output first. 
However, the order within a color plane remains the same, i.e. odd descending, even 
ascending. 

From Figure 10 and Table 4, it can be seen that the plane which has to be loaded first (i.e. 
even or odd) depends on the arrangement. Also, the order in which the dots have to be 
loaded (e.g. even ascending or descending etc.) is dependent on the arrangement. 

If the device controlling the printheads can re-order the bits according to the following cri- 
teria, then it should be able to operate in all the possible printhead arrangements: 

• Be able to output the even or odd plane first 

• Be able to output even and odd planes in cither ascending or descending order, inde- 
pendently. 

• Be able to reverse the sequence in which the color planes of a single dot are output to 
the printhead. 
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Figure 10. All 8 Printhead Arrangements 
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Table 4, Order In wtiich even and odd dots and planes are loaded into the various 
prlnthead arrangements 





^^eft^li^ifMriift^Dafia 


lSk RIflhtslde- ofiprintediDaaelSsi 


Arrangement 1 


Even ascending loaded first 
Odd descending loaded second 


Odd descending loaded first 
Even ascending loaded second 


Arrangement 2 


Odd descending loaded first 
Even ascending loaded second 


Even ascending loaded first 
Odd descending loaded second 


Arrangement 3 


Odd ascending loaded first 
Even descending loaded second 


Even descending loaded first 
Odd ascending loaded second 


Arrangement 4 


Even descending loaded first 
Odd ascending loaded second 


Odd ascending loaded first 
Even descending loaded second 


Arrangement 5 


Odd ascending loaded first 
Even descending loaded second 


Even descending loaded first 
Odd ascending loaded second 


Arrangement 6 


Even descending loaded first 
Odd ascending loaded second 


Odd ascending loaded first 
Even descending loaded second 


Arrangement 7 


Even ascending loaded first 
Odd descending loaded second 


Odd descending loaded first 
Even ascending loaded second 


Arrangement 8 


Odd descending loaded first 
Even ascending loaded second 


Even ascending loaded first 
Odd descending loaded second 
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1.0 Basic Requirements 

To create a two part printhead, of A4/Letter portrait width to print a page in 2 seconds. 
Matching Left/Right chips can be of different lengths to make up this length facilitat- 
ing increased wafer usage, the left and right chips are to be imaged on an 8 inch wafer 
by "Stitching" reticle images. 

The memjet nozzles have a horizontal pitch of 32 um, two rows of nozzles are used for 
a single colour. These rows have a horizontal offset of 1 6 imi. This gives an effective 
dot pitch of 1 6 um, or 62.5 dots per mm, or 1 587.5 dots per inch, close enough to mar- 
ket as 1600 dpi. 

The fu^t nozzle of the right chip should have a 32 um horizontal offset from the fmal 
nozzle of the left chip for the same color row. There is no ink nozzle overlap (of the 
same colour) scheme employed. 

!•! Power Supply 

VddA/pos and Ground supply is made through 30 um wide pads along the length of the 
chip using conductive adhesive to bus bar beside the chips. VddA^pos is 3.3 Volts. 
(12V was considered for Vpos but routing of CMOS Vdd at 3.3V would be a problem 
over the length of the chips, but this will be revisited). 

1.2 MEMS cells 

The current memjet device requires ISOnJ of energy to fire, with a pulse of current for 
1 usee. Assuming 95% efficiency, this requires a 55 ohm actuator drawing 57.4 mA 
during this pulse. 

1.2.1 ISSUE!!! 

For 1 pages per 2 second, or -300 mm * 62.5 (dots/mm) / 2 sec ~= 10 kHz or 1 00 usee 
per line. With 1 usee fu-e pulse cycle, every 100th nozzle needs to fire at the same time, 
(looking ahead) We have 13824 nozzles across the page, so we fire 138 nozzles at a 
time. That is about 8 Amperes if all nozzle fire. 

That is 8 Amperes is for only 1 coloiu*! 16A ♦ 6 colours = 96 A for all colours. 

How many colours could print at the same time. CMYK colour space requires on 2 col- 
ours at the time are required, to create map any colour (saturated, full bleed back- 
grouiid). But the fixative ink us also required, and 12% coverage of InfiaRed ink, 
means 3.12 inks would be a the "best" worst case. Unfortunately, the peak could be all 
6 inks, as colours are not aligned to print the same point at the same time. 

With a 138 nozzles ♦3.12 inks firing at the same time, at 180 nj each in 1 usee, the 
memjet nozzle will average (138*3.12*180)/1000 = 78 W running a fiiU speed. 
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1.2.2 64um unit cell height 

This cell would have 4 line spacing between the odd and even dots, and 8 line spacing 
between adjacent colours. 

1.23 80 um unit cell height 

This cell would have 5 line spacing between the odd and even dots, and 10 line spacing 
between adjacent colours. 

1.3 Versions 

1.3.1 6 Colour 1600 dpi with 64 um unit cell 

Left and Right Chip. This version will not be prototyped. 

1.3.2 6 Colour 160Q dpi with 80 um unit cell 

Left and Right Chip. 

1.3.3 4 Colour 800 dpi with 80 um unit cell 

For camera application. Single nozzle row per colour. 
This version will not be prototyped. 

1.4 Air Supply 

Air must be supplied to the MEMS region through holes in the chip. 

2.0 Head Sizes 

The combined heads have 1 3 824 nozzles per colour totalling 221.1 84mm of print area. 
Enough to provide full bread for A4 (210 mm) and Letter (8.5 inch or 215.9 mm). 



TABL£ 1. Head Combinations 



Left Head 


Right Head 


Stitch Parts 


Nozzles per Colour 


Stitch Parts 


Nozzles per Colour 


8 


11160 


2 


2664 


7 


9744 


3 


4080 


6 


8328 


4 


5496 


5 


6912 


5 


6912 


4 


5496 


6 


8328 
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TABLE 1. Head Combinations 



Left 


Head 


Right Head 


Stitch Parts 


Nozzles per Colour 


Stitch Parts 


Nozzles per Colour 


3 


4080 


7 


9744 


2 


2664 


8 


11160 



Nozzles per Colour is calculate as (("Stitch Parts*' -I)*l 18+104)*12. Nozzles per row 
is half this value. Most likely the 8:2 head set will not be manufactured. My current 
wafer layout, manages to avoid this set, without any loses. 

3.0 Interface 

Each print head has the same I/O signals (but the Left and Right versions might have a 
different pin out). Pins marked as .common can be controlled by the signal from the 



TABLE 2. I/O pins 



Name 


I/O 


Function 


Common 


Max 
Speed 
(MHz) 


DatafO'JJ 


I 


Dot data for colours 0-5, using Differential Signalling 
(DataL the complementary signal), colours[0-2] on 
Data(0], colour[3.5] on Data[l] 


No 


300 




o 


Feedback for CMOS testing {LSyncL^\,ReadL^) 

and {LSynd>=Q, 

10] - nozzle test result 

[1] - temperature 






DataLfO-JJ 


I 


complementary signal of Data[0-1] 








o 


Feedback for CMOS testing {LSyncL=l, ReadL^) 
and (LSynclM), ReadL=0) 
[0] - nozzle test result 
[1] - temperature 






SrClk 


I 


Dot data shift clock using Differential Signalling 
(SrClkL the complementary signal) 


No* 


600^ 


SrClkL 


I 


complementary signal of SiClk 






ReadL 


I 


Data[0-l]/DataL[0-l] in output mode (driving non-dif- 
ferential) 


Yes 


1 


FrCJk 


I 


Fire pattern shift clock 


Yes 


1 


Pr 


I 


Pulse Profile for all colours 


Yes 




LsyncL 


1 


0 - Capture dot data for next print line 


Yes 


0.1** 



a- Functionally could be common, but for timing/electrical reasons should run point to point. 

b. 300 MHz clock, so edges are 600 Mhz rate 

c. 1 MHz cycle, but the resolution of the mark/space ratio may require 50 ns. 

d. 1 0 kHz cycle, with minimimi low pulse of 1 0 ns (no maximum). 

controller (SOPEC). 
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3A Dot firing 



To fire a nozzle, three signals are need. A dot data, a fire signal, and a profile. When all 
signals are high, the nozzle will fire. 



FIGURE 1. Print head structure 



a. 




The dot data is provide to the chip through a dot shift register with input DatafxJ, 
and clocked into the chip with SrClk. The dot data is multiplex on to the Data signals, 
as Dot[0-2] on DatafOJy and DotfS-SJ on Dataf2J. After the dots are shifted into the 
dot shift register, this data is transfer into the dot latch, with a low pulse in 
LsyncL. The value in the dot latch forms the dot data used to fire the nozzle. The use 
of the dot latch allows the next line of data to be loaded into the dot shift register, 
at the same time the dot pattern in the (dot latch is been fired. 

Across the top of a column of nozzles, containing 12 nozzles, 2 of each colour (odd and 
even dots, 4 or 5 lines apart), is two fire register bits and a select register bit. The 
fire registers forms the fire shift register that runs length of the chip and back 
again with one register bit in each direction flow. 
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FIGURE 2. Column Structure 
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The sefect register forms the Select Shift Register that runs the length of the 
chip. The select register, selects which of the two fire registers is used to enables 
this column. A 'O' in this register selects the forward direction fire register, and a ' 1* 
selects the reverse direction fire register 

The third signal need, the profile, is provide for all colours with input Pr across the 
whole colour row at the same time (with a slight propagation delay per column). 
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3.2 Dot Shift Register Orientation 

The left side print head (chip) and the right side print head that form complete bi-lithic 
print head, have different nozzle arrangement with represent to the dot order mapping 
of the dot shift register to the dot position on the page. 



FIGURE 3. Print head dot shift register dot mapping to page 
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Left Print Head - (n-m) nozzles Right Print Head - m nozzles 

With this mapping, the following data streams will need to provided. 



TABLE 3. Head Combinations shift patterns (n"13824) 



Left Head 


Right Head 


Size 


n-m 


dot order 


m 




7:3 


9744 


[13822,13820,13818 4084,4082.4080,] line y+5 

[4081,4083,4085 ,13819,13821,13823] line y 


4080 


[1,3,5 4075,4077,4079,] line y+5 

[4078,4076,4074, 4,2,0] line y 


6:4 


$32S 


[13822,13820,13818,.,.,5500,5498,5496,] line y+5 
[5497,5499,5501,....,13819,13821, 13823] line y 


5496 


[1,3,5 5491,5493,5495,] line y+5 

[5494,5492,5490,....,4,2,0] line y 


5:5 


6912 


[13822,13820, 13818,....6916,6914.6912,] line y+5 
[6913,6915,6917,....,13819.13821, 13823] line y 


6912 


[1,3,5,...,6907,6909,6911J line y+5 
[6910,6908,6906,.. ..,4,2,0] line y 


4:6 


5496 


[13822,13820,13818,..„8332,8330,8328,] line y+5 
[8329,8331,8333,.,..,13819,13821,13823] line y 


8328 


[1,3,5,...,8323,8325,8327,] line y+5 
[8326,8324,8322,....,4,2,0] line y 


3:7 


4080 


[13822,13820,13818,....9748,9746,9744,] line y+5 
[9745,97447,9749,....,13819,13821,13823] line y 


9744 


[l,3,5,.",9739,974l,9743,] line y+5 
9742.9740,9738 ,4.2,0) liney 



The data needs to be multiplex onto the data pins, such that Data[0] has {(CO, CI, C2), 
(CO, CI, C2)....} in the above order, and Data[l] has {(C3, C4, C5), (C3, C4, C5)....}. 



Figure 4 shows the timing of data transfer during normal printing mode. Note SrCIk 
has a default state of high. If there are L nozzles per colour, SrClk would have 3L 
pulses (and 3i+l rising edges). 
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FIGUR£ 4. Data Timing During Printing 



n_jTj~Lj~i_rmj~Lj^^ 



SrClk 
LSyncL 



Data requires a setup and hold about the falling edge of SrClk, SrClk default state is 
high (needs to return to high after the last data of the line). LSyncL rising requires 
setup to the first rising SrClk, and must stay high during the data transfer. 

3.3 Fire Shift Register 

The fire shift register controls the rate of nozzle fire. If the register is fixll of * I's then 
the you could print the entire print head in a single FrClk cycle. You do not want to do 
that (4800A)! 

Ideally, a * T is shifted in to the fire shift register, in every n^ position, and a '0* in all 
other position. In this manner, after n cycle of FrClk, the entire print head will be 
printed. 

The fire shift register and select shift registers allow the generation of a hori- 
zontal print line that on close inspection would not have a discontinuity of a "saw 
tooth" pattern, Figiire 5 a) & b) but a "sharks tooth" pattern of c). 

FIGURE 5. Print quality 




a) Printing every dot with all zero's In the fire select shift register 




b) Printing every dot with all one's in the fire select shift register 




c) Printing every dot with n zero's then n one's in the fire select shift registers 



This is done by firing 2 nozzles in every 2n group of nozzle at the same time starting 
outer 2 nozzles working towards the centre two (or the starting from the centre, and 
working towards the outer two) at the fu-e rate controlled by FrClk. 
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To achieve this fire pattern the fire shift register and select sliift register need to 
be set up as show in Figure 6. 

FIGURE 6. Fire and Select Shift Register setup for printing 

m!r£mmmpsssmmmm>sf^ — ifire shift 



. .0000000. . . .0001111111. . . .1110000000. . 



— I' 

^^irogister 
OOOllllllX. * select sbift reg 



The pattern has shifted a * T into the fire shift register every n^^ positions (where n is 
usually is a minimum of about 100) and n * I's, followed n *0's in the select shift 
register. At a start of a print cycle» these patterns need to be aligned as above^ with the 
"1000..." of a forward half of fire shift register, matching an n grouping of *r or 
'O's in the select shift register. As well, with the "1000..." of a reverse half of the 
fire shift register, matching an n grouping of *r or 'O's in the select shift regis- 
ter. And to continue this print pattern across the butt ends of the chips, the select 
shift register in each should end with a complete block of n ' Ts (or '0*s). 

FIGURE 7. Fire Pattern across butt end of Print Chips 



. 1110000000 .... 0001111111 • . . « 111 
X.e£t Print Read Fire/Select: SR 



3C 



1111111. . , .1110000000 0001111111 

Right; Print Head Fire/Select SR 



Since the two chips can be of different lengths, it makes initialisation of these pattern 
difficult. This is solved by building initialisation circuitry into chips. This circuit is 
controlled by to registers, nlen(14) and count(14) and b(1). These registers are 
loaded serially through DatafOJ, while LSyncL is low, and ReadL is high with FrClk. 



FIGURE 8. Fire Pattern Generation 




count 



FS_INrr 
clocked by F^flt 
a gated frC/A: 




Mis 0? 



serial load path enabled by Scan 



fire shift fefl'ster 

clocked by fsclk a gated FrClk 



select shffl register 



clocked by SelClk a gated FrClk 



The scan order from input is n[13-0],c[0-13], therefore b is shifted in last. 
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The following table shows the values to programme the bi-lithic head pairs using a fire 



TABLE 4. Head Combinations InitiaHsation for /i=l(M) 



Nozzles 
La 


Nozzles 
Lb 


n^«n(A&B) = 


countA ° 
(L^/2) mod A 
-1 


bA 


bB 


(Lb/2) mod n 


' counts 
(LA-LB+rem) mod n 
-1 


9744 


4080 


99 


71 


0 


0 


40 


3 


8328 


5496 


99 


63 


0 


0 


48 


79 


6912 


6912 


99 


55 


0 


0 


56 


55 



pattern length of 100. The calculation assumes head 'A' is the longest head of the pair 
and once the registers are initialised with LA FrClk cycles (ReadL='0', LSyncL='r). 
rem would be the correct value for counte if chip B was only clocked (FiClk) Lg 



times. But this chip will be over clocked L^-Lb cycles. The values of and bg are 
either the same or inverse of each other. The actually value does not matter. They need 
to be different from each other if the select shift registers would end up with differ- 
ent values at the butt ends. If (JL/Jln) is even (and county^ is non zero), then the fmal 
run in *A*s select shift register will be Ib^. If (Ly^.LB/2) mod n is even (and countB is 
non zero) then the final run in 'B's select shift register will be Ibg. 



FIGURE 9* Determining Select Shift Register value 



^ 


► La 


1 1 




^ 


► 


Lx/2 select shift register length '^A 



HeadB 

< ► Lb 

^ ^ Lb/2 select shift register length 



3.4 Profile Pattern 

A profile pattern is repeated at FrClk rate. It is expected to be a single pulse about lus 
long. But it could be a more complicated series of pulse. The actual pattern depends on 
the ink type. 

The following figure show the external timing to print a line of data. In this example 
the line is printed in 8 cycles of FrClk, 
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FIGURE 10. Timing for printing Signals 
■t 

Lsyncll 1 1 



ReadL 




n n n n n_rLrL_n tlt 

_j*jTjiLrmjmjmjmJiiB ri_ 

3-5 Interface Modes 

The print heads a eight different modes controlled by signals ReadL and LSyncL. As 
seen in Figure 9 with both LSyncL and ReadL high, the chip in normal printing mode. 
Some of these mode can operate at the same time, but may interfere with the result of 
the other modes. 



TABL£ 5. Print Head Modes 



ReadL 


LSyncL 


Mode 


Internal 
Mapping 


1 


1 


Normal Print Mode 


SrClk=SrClk/3 

frcllc=FrClk 

SelClkM) 

FsClk=FiClk 

Scan=0 

CoreScan=0 


X 


0 


Dot Load Mode 

• Dot latches are open, loaded with Dot shift regis- 
ters, latch once LSyncL returns to 1 (this happens 
regardless of ReadL) 

• Enables Dot Shift register to capture fire result. 




1 


0 


Fire Load Mode 

• Data[0] will shift through nien, count and b with 
FrClk 


SrOk^X 

frclk-X 

SelClk=X 

FsClk«FrClk 

Scan=l 

CoreScan^X 
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TABLE 5. Print Head Modes 



ReadL 


LSyncL 


Mode 


Internal 
Mapping 


0 


1 


Reset Nozzle Test 

• Resets the state of nozzle test circuit 


SrClk=SrClk 
FrClk=FrClk 
SelClk^FrClk 


0 


1 


CMOS testing mode 

• The contents of the dot shift registers are serial 
shifted out on Data [0-1] with SrClk 


Scan«=0 
CoreScan^l 


0 


1 


Fire Initialise mode 

• The contents of the fire shift register and select 
shift register is generated with FrClk 


0 


0 


Temperature Output 

• The series of Delta Sigma ou^ut are clocked out on 
Data[0] with FrClk, The sum of these bits represent 
the temperature of the chip. 


SrClk=X 

frcUc-O 

SelClk=0 

FsClk=0 

Scan^ 

CoreScan=X 


0 


0 


Nozzle Test Output 

• The result of a nozzle test is output on Data[ 1 ] . 



3.5.1 Printing 



Figure 1 0 shows show timing for normal printing. During this action, we drop out of 
Normal Print Mode^ to Dot Load Mode between line transfers. For printing to perfomi 
correctly, no other signal should be stable. 

3.5.2 Initialising for Printing 

To initialise for printing the fire shift registers and select shift registers need to setup 
into a state as shown in Figure 7. To do this the chips are put into Fire Load Mode and 
the values for nlen, count and b are serially shifted from Data[0] clocked by FrClk, 
As the two chip have separate Data line, and common FrClk, this happens at the same 
time. Once this is done, mode is changed to Fire Initialise Mode, and further FrClk 
cycles are provided to both chips. During all these operation Pr should be low, to pre- 
vent unintentional firing for nozzles. 
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FIGURE 11. Initialising Print Heads 



LsyncL 








ReadL 

DataA[0] ( bA,lnen[13-0],count[&.13)A )— 








DataB[0] ( bB>lnen{13-0],count[0-13]B ) 




SrClk 




HSBI 1 1 ll»Mi011S'!llllll !■ 




Pr 


^ ► La cycles 


Fire Load Mode 


Fire Initialise Mode 



3.5.3 Nozzle Testing 

Nozzle testing is done by firing a single at a time a monitoring the DataflJ pin in the 
Nozzle Test Output mode. 

Each nozzle has a test switch with closes when it nozzle is fired. All 12 switches in a 
nozzle column are connect in parallel to the following circuit. 

FIGURE 12. Nozzle Test Latching Circuit 



Testout 




This circuit is initialised when ever LSyncL is high and ReadL is low (Reset Nozzle 
Test mode). This forces all "switch nodes" to low, and the feedback through lower NOR 
gate will latches this value. With LSyncL low and ReadL still low (Nozzle Test Output 
mode) the Testout of the first nozzle colimm is output on DataflJ, If any switch is 
closed, the switch node of this column will be pulled up, and will ripple through to the 
output as transition from high to low. 
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FIGURE 13. Nozzle Testing 
LsyncL 



ReadL 

Data[11 ^^^ r 

SrCIK ^H ~ 
FrClkfl^lS 



Pr 



Set up Test 



Reset Nozzle Test Mode 



Nozzle Test Output 
Mode 



Setup 
Test 



Nozzle testing requires a setup phase in order to fire only one nozzle. There are many 
ways to achieve this. Simplest might be to load a single colour with 101010 through the 
even nozzles, and 010101... for the odd nozzles (O's for all other colours), and set up a 
fire pattern with n = L^/l, With this fire pattern only one nozzle will fire in each Pr 
pulse. After firing in Nozzle Test Output mode, a single FrClk will advance to next 
nozzle, then Reset and Test, After cycles of this testing, a single SrClkwiU 
advance the dot shift registers to setup the imtested nozzles of this colour, and another 
La/2 cycles of FrClky Reset and Test will finished testing this colour. Then repeat test 
procedtire for other colours. 



3*5.4 Temperature Output 

This mode is not well defined yet. In this mode, DatafOJ will output a series of ones 
and zeros clocked by FrClk. After a (currently unknown) nimiber of FrClk cycles the 
sum of this series will represent the temperature of the chip. Clocking firequency in this 
mode it expected to be in the range lOkHz - IMHz. 
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FIGURE 14. Temperature Reading 



LsyncL L 



ReadL 



DatalO] — C 
SiClk 



I I I 



FrClk 



Pr 



The Frequency of FrClk and the number of cycles need to be programmable. Since this 
mode cycles FrClk, the result of fire shift register and select shift register would be 
changed, but in this mode FrClk is disabled to these circuit. So printing can resxune 
without reinitialising. 



3*5.5 CMOS Testing 

CMOS testing is a mode meant for chip testing with before MEMS as added to the 
chip. This mode allows the dot shift register to be shifted out on the Data [0-1] pins. 
Much like the nozzle test mode, the nozzles are fired while LSyncL is low, but during 
the firing SrClk will be cycle, and the dot shift register will load the signal that 
would fire the nozzle. Once capture, the result can be shifted out. 

FIGURE 15. CMOS Testing 
LsyncL 



ReadL 



Data 

SrCIkj 
FrClkj 

Pr 



Set up Test 



Jl 



Dot Load Mode 



CMOS Test Output Mode 



The Dot Load Mode above violates nomial printing procediure by firing the nozzles 
(Pr) and modify the dot shift register (SrClk). 
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4.0 Reticle Layout 



To make long chips we need to stitch the CMOS (and MEMS) together by overlapping 
the reticle stepping field. The reticle will contain two areas: 



FIGURE 16. Reticle Layout 




The top edge of Area 2, pad end contains the pads that stitch on bottom edge of Area I, 
CORE. Area J contains the core airay of nozzle logic. The top edge of Area J will stitch 
to the bottom edge of itself. Finally the bottom edge of Area 2, Burr end will stitch to 
the top edge of Area 1. The Bun end to used to complete a feedback wiring and seal 
the chip. 

The above region will then be exposed across a wafer bottom to top. Area 2, Area /, - 
Area 7...., Area 2. Only the PAD end of Area 2 needs to fit on the wafer. The fmal expo- 
sure fo Area 2 only requires the butt end on the wafer. 
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FIGURE 17. Stepper Pattern on Wafer 











S 








1 


i 


'iff. 






















i't' i • ' ' 


iSi 














B 








4.1 TSMC U-Frame requirements. 

TSMC will be building us frames 1 0 mm x 0.23 mm which will be placed either side of 
both Area 1 and Area 2. 

TSMC requires 6 mm area for blading between the two exposure area. This translates 
to 3 mm on the reticle, as some recticles are 2x size, while most are 5x, the worst case 
must be used. 
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1 Introduction 

1.1 Document History 











1.6 


29 November. 2002 


SlrTK>n Walmsley 


Updated ChipA to be ChipR to match proto- 
cols document, got rid of 68k reference now 
that we are using LEON. 


1.6 


26 November, 2002 


Simon Walmsley 


Added description of storing more than a sin- 
gle SoPECJd key in a PRINTER_QA (\n sec- 
tton o.o.o ano reiaiBQ^. i nio njuuuea uio 
of a multi-SoPEC system with no loss of secu- 
rity. 

Also added text to describe that batch keys 
can be different for each SoPEC if the indirect 
upgrade key protocol Is used. 








1.4 


9 September, 2002 


Simor) Walmstey 


Added section in requirements detailing types 
of attacks we care about and don't care about. 


1.3 


30 August, 2002 


Simon Waimsley 


Changed ComCo_OEM_xxxx variables into 
simply xxxx variables, since that is more 
generic. Added text regarding ink reflll. Added 
extra software authentication stage to prevent 
ComCos from fiddling with SoPEC software. 


1.2 


29 August. 2002 


Simon Walmsley 


Added section on how me PftiN i tR^QA chip 
gets programmed with the SoPEC_id_key. 


1.1 


28 August. 2002 


Simon Walmsley 


Updated to have Ink and operating parameters 
be authenticated via symmetric key t>ased sig- 
natures based on a unique SoPEC_id. 


1.0 


27 August 2002 


Simon Walmsley 


Updated after review. 


0.2 drafl 


26 August, 2002 


Simon Walmsley 


Changed public-key an6 private key refer- 
ences to asymmetric & symmetric respec- 
tively, so private can now sub-refer to the 
private key of the asymmetric pair, or the sirv 
gle private symmetric key. Changed OEM.Id 
into ComCo^OEM Jicense_ld to more accu- 
rately reflect the scope of the W. 


0.1 dran 


26 August, 2002 


Simon Walmsley 


initial issue. 



1.2 References 

[1] Silicon & Software Systems, 4-4-9-4 SoPEC Hardware Design. 

[2] Silverbrook Research, 4-2-J'J Print Engine Controller Hardware Design, 

[3] Silvcibrook Research, 4-3-] -2 QA Chip Technical Reference, 

[4] Silverbrook Research, 4-3-1-8 QA Chip Programmer Requirements, 

[5] Silverbrook Research, 4-3-1-26 Authentication Protocols, 

1.3 Scope 

This document describes the basic security requirements of programs nmning on the 
SoPEC ASIC [1). It then describes an implementation solution to the security require- 
ments. 
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The described solution impacts the design of the SoPEC ASIC as well as implying key 
management issues. The solution includes references to the QA Chip ASIC [3] and associ- 
ated authentication protocols [5], 

It is possible that some of the requirements and defined solution will be applicable to sys- 
tems built with the PEC ASIC [2], although such systems are beyond the scope of this 
document. 
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READERSHIP 



This document is written for software engineers and system architects that are working 
with SoPEC, as well as PCB designers that are responsible for SoPEC-based Print 
Engines. A similar audience working on PEC and PEC-based Print Engines may also find 
document useful. 

This document is also intended to be read by those responsible for key management and 
associated database designers with regards to guiding requirements. 

This document is confidential to Silveibrook Research Pty. Ltd. and its distribution out- 
side this organisation must be covered by a non-disclosure agreement (NDA). 



The Authentication Protocols document [5] refers to QA Chips by their function in partic- 
ular protocols: 

• For authenticated reads, ChipR is the QA Chip being read from, and ChipT is the QA 
Chip that identifies whether the data read from ChipR can be trusted. 

• For replacement of keys, ChipP is the QA Chip being programmed with the new key, 
and ChipF is the factory QA Chip that generates the message to program the new key. 

• For upgrades of data in memory vectors^ ChipU is the QA Chip being iqjgraded, and 
ChipS is the QA Chip that signs the upgrade value. 

Any given physical QA Chip will contain functionality that allows it to operate as an 
entity in some number of these protocols. 

Therefore, wherever the terras ChipR, ChipT, ChipP, ChipF, ChipU and ChipS are used in 
this document, they are referring to logical entities involved in an authentication protocol 
as defined in [5]. 

Physical QA Chips are referred to by their location. For example, each ink cartridge may 
contain a QA Chip referred to as an INK^QA. with all INK_QA chips being on the same 
physical bus. In the same way, the QA Chip .inside the printer is referred to as 
PRINTER^QA, and will be on a separate bus to the INK_QA chips. 



1.5 



QA Chip Terminology 



Confidential 



November 29. 2002 



4 



SUverbrook Research 



SoPEC Security Overview 



4-4-1-3 vl.6 



2 Requirements 

2.1 Security 

The basic functional security requirements are: 

• Silverbrook code and OEM program code co-existing safely 

• Silverbrook operating parameters authentication 

• OEM operating parameters authentication 

• Ink usage authentication 

Each of these is outlined in subsequent sections. 
The authentication requirements imply that; 

• OEMs and end-users must not be able to replace or tamper with Silverbrook program 
code or data 

• OEMs and end-users must not be able to call unauthorized functions within Silver- 
brook code 

• End-users must not be able to replace or tamper with OEM program code or data 

• End-users must not be able to call unauthorized functions v^thin OEM program code 

• OEMs must be able to test products at their highest upgradable status, yet not be able 
to ship them outside the terms of their license 

• OEMs and end-users must not be able to directly access the print engine pipeline 
(PEP) hardware, the LSS Master (for QA Chip access) or any other peripheral block 
with the exception of operating system permitted GPIO pins and timers. 

2.1.1 Silverbrook code and OEM program code co-existing safely 

SoPEC includes a CPU that must run both Silverbrook program code and OHM program 
code. The execution model envisaged for SoPEC is one where Silverbrook program code 
forms an operating system (O/S), providing services such as controlling the print engine 
pipeline, interfaces to communications channels etc. The OEM program code must run in 
a form of user mode, protected from harming the Silverbrook program code. The OEM 
program code is permitted to obtain services by calling functions in the O/S, and the O/S 
may also call OEM code at specific times. For example, the OEM program code may 
request that the O/S call an OEM interrupt service routine when a particular GPIO pin is 
activated. 

A basic requirement then, for SoPEC, is a form of protection management, whereby Sil- 
verbrook and OEM program code can co-exist without the OEM program code damaging 
operations or services provided by the Silverbrook O/S. Since services rely on SoPEC 
peripherals (such as SCB, LSS Master, Timers etc) access to these peripherals should also 
be restricted to Silverbrook program code only. 

2.1 .2 Sirverbrook operating parameters authentication 

A particular OEM will be licensed to run a Print Engine with a particular set of operating 
parameters (such as print speed or quality). The OEM and/or end-user can upgrade the 
operating license for a fee and thereby obtain an upgraded set of operating parameters. 

Neither the OEM nor end-user should be able to upgrade the operating parameters without 
paying the appropriate fee to upgrade the license. Similarly, neither the OEM nor end-user 
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should be able to bypass the authentication mechanism via any program code on SoPEC. 
This implies that OEMs and end-xiseis must not be able to tamper with or replace Silver- 
brook program code or data, nor be able to call imauihorized functions within Silverbrook 
program code. 

However, the OEM must be capable of assembly-line testing the Print Engine at the 
upgraded status before selling the Print Engine to the end-user. 



2.1.3 OEM operating parameters authentication 

The OEM may provide operating parameters to the end-user independent of the Silver- 
brook operating parameters. For example, the OEM may want to sell a franking machined 



The end-user should not be able to upgrade the operating parameters without pajdng the 
appropriate fee to the OEM. Similarly, the end-tiser should not be able to bypass the 
authentication mechanism via any program code on SoPEC. This implies that end-users 
must not be able to tamper with or leplace OEM program code or data, as well as not be 
able to tamper with the PEP blocks or service-related peripherals. 



Each OEM sells printers and ink to end-users according to a business model. For example, 
OEM I may provide ink at $ A for a $B printer, while OEMj may sell the same featured 
printer at a higher price $A+$X, and provide the ink at a cheaper price $B-$Y. OEMi has 
a business model that relies on the fact that end-users of OEM] printers can only use 
OEMj ink, and likewise OEMj has a business model that relies on the fact that end-users 
of OEM2 printers can only use OEM2 ink. 

It is in the interest of both OEMj and OEM2 that end-users cannot subvert the authentica- 
tion mechanism for ink. Otherwise the business models are compromised. 

It is also in the interests of the Memjet Group that OEMi and OEM2 cannot subvert the 
authentication mechanism for ink, since the Memjet Group provides OEMs with printers 
under a license .agreement that the OEM will purchase ink from a designated ink supplier. 



Since there is no protection physically built into the Memjet printheads, it is theoretically 
possible for someone (with enough time, money and incentive) to remove the printheads 
from the print engine, build their own SoPEC ASIC equivalent, write their own program 
code etc. It is impossible to guard against such an attack. 

We are really only concerned with commercial attacks, where there is a total compromise 
of printer operating parameter authentication and ink usage authentication. An example of 
such an attack is where the Silverbrook printing O/S is replaced by one that can be down- 
loaded from the internet, and this clone O/S allows usage of the print engine outside the 
license agreement. Whether the clone O/S is developed by a hacker or by a rogue OEM is 
not important - it matters that any user can trivially upgrade the printer outside the terms 
of the license agreement. 



2.1.4 



Ink usage authentication 



2.2 



Acceptable Compromises 



t. a frankixig machine prints stamps 
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If an end user takes the time and energy to hack the print engine and thereby succeeds in 
upgrading the single print engine only, yet not be able to use the same keys etc on another 
print engine, that is an acceptable security compromise. However it doesn't mean wc have 
to make it totally simple or cheap for the end-user to accomplish this. 

Software-only attacks are the most dangerous, since they can be transmitted via the inter- 
net and have no perceived cost Physical modification attacks are far less problematic, 
since most printer users are not likely to want their print engine to be physically modified. 
This is even more true if the cost of the physical modification is likely to exceed the price 
of a legitemate upgrade. 

Finally, it should be noted that all OEMs are bound by license agreements that specify 
penalties if they attempt to reverse engineer or bypass the print engines. In countries 
where these agreements are enforceable by law, this at least provides a modicum of secu- 
rity. 

2.3 Implementation Constraints 

Any solution to the requirements detailed in Section 2.1 must also meet certain implemen- 
tation constraints. These are: 

• No flash memory inside SoPEC 

• SoPEC must be simple to verify 

• Silverbrook program code must be updateable 

• OEM program code must be updateable 

• Must be bootable from activity on USB or ISI 

• No extra pins for assigning IDs to slave SoPECs 

• Cannot trust the comms channel to the QA Chip in the printer (PRINTER_QA) 

• Caimot trust the conuns channel to the QA Chip in the ink cartridges (INK^QA) 

• Cannot trust the ISI comms channel 
These constraints are detailed below. 

2.3.1 No flash memory inside SoPEC 

SoPEC is intended to be implemented in 0. 1 3 micron or smaller. Flash memory will not be 
available in any of the target processes being considered. Although Virage have a process 
independent flash cell, it is very large and effectively impractical for anything more than a 
few bits. 

2.3.2 SoPEC must be simple to verify 

All combinatorial logic and embedded program code within SoPEC must be verified 
before manufacture. Every increase in complexity in either of these increases verification 
eflbrt and increases risk. 

2.3.3 Silverbrook program code must be updateable 

It is not possible nor even desirable to write a single complete operating system that is: 

• verified completely (see Section 2.3. 1 ) 

• correct for all possible fiiture uses of SoPEC systems 

• finished in time for SoPEC manufacture 
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Therefore the complete Silverbrook program code must not permanently reside on 
SoPEC, It must be possible to update the Silverbrook program code as enhancements to 
functionality are made and bug fixes are applied. 

In the worst case, only new printers would receive the new functionality or bug fixes, hi 
the best case, existing SoPEC users can download new embedded code to enable ftmction- 
ality or bug fixes. Ideally, these same users would be obtaining these updates from the 
OEM website or equivalent, and not require any interaction with Silveibroolc 

2.3.4 OEM program code must be updateabie 

Given that each OEM will be writing specific program code for printers thai have not yet 
been conceived, it is impossible for all OEM program code to be embedded in SoPEC at 
the ASIC manufacture stage. 

Since flash memory is not available (see Section 2.3.1), OEMs cannot store their program 
code in on-chip flash. While it is theoretically possible to store OEM program code in 
ROM on SoPEC, this would entail OEM-specific ASICs which would be prohft)itively 
expensive. Therefore OEM program code cannot permanently reside on SoPEC. 

Since OEM program code must be downloadable for SoPEC to execute, it should there- 
fore be possible to update the OEM program code as enhancements to functionality are 
made and bug fixes are applied 

In the worst case, only new printers would receive the new functionality or bug fixes. In 
the best case, existing SoPEC users can download new embedded code to enable ftinction- 
ality or bug fixes. Ideally, these same users would be obtaining these updates from the 
OEM website or equivalent, and not require any interaction with Silverbrook. 

2.3.5 Must be bootable from activity on USB or ISI 

SoPEC can be placed in sleep mode to save power when printing is not required RAM is 
not preserved in sleep mode. Therefore any program code and data in RAM will be lost. 
However, SoPEC must be capable of being woken up from the host when it is time to print 
again. 

In the case of a single SoPEC system, the host conmiunicates with SoPEC via USB. 

In the case of a multi-SoPEC system, the host typically communicates with the ISI Master 
chip (e.g. the ISI Master could be SoPEC, and the conrnis is USB), and can send messages 
to other slave SoPECs via the ISI master. The ISI master SoPEC relays these messages to 
the slaves via the ISI. 

Therefore SoPEC must be capable of being woken up by activity on either the USB or on 
the ISI. 

2.a.6 No extra pins to assign IDs to s!ave SoPECs 

In a single SoPEC system the host only sends data to the single SoPEC. However in a 
multi-SoPEC system, each of the slaves needs to be uniquely identifiable in order to be 
able for the host to send data to the correct slave. 

Since there is no flash on board SoPEC (Section 2.3.1) we are unable to store a slave ID 
(eg 4 bits) in each SoPEC. Moreover, any ROM in each SoPEC will be identical. 
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It is possible to assign n pins to allow 2" combinations of IDs for slave SoPECs. However 
a design goal of SoPEC is to minimize pins for cost reasons, and this is particularly true of 
features only used in multi-SoPEC systems. We have 2 pins for inter-SoPEC communica- 
tions, and further pins would add to the cost. 

The design constraint requirement is therefore to allow slaves to be IDed via a method that 
does not require any extra pins. This implies that whatever boot mechanism that satisfies 
the security requirements of Section 2.1 must also be able to assign IDs to slave SoPECs. 

2.3.7 Cannot trust the comms channel to the OA Chip m the printer (PRINTER^QA) 

If the printer operating parameters are stored in the non-volatile memory of the Print 
Engine*s on-board PRINTER_QA chip, both Silverbrook and OEM program code cannot 
rely on the communication channel being secure. It is possible for an end-user to replace 
the PRINTER^QA chip or subvert the communications channel. 

2.3.8 Cannot trust the comms channel to the OA Chip In the ink cartridges (INK.QA) 

The amount of ink remaining for a given ink cartridge is stored in the non-volatile mem- 
ory of that ink cartridge*5 INK_QA chip. Both Silverbrook and OEM program code can- 
not rely on the communication channel to the INK_QA being secure. It is possible for an 
end-user to replace the INK_QA chip or subvert the communications channel. 

2.3.9 Cannot trust the IS! comms channel 

In a multi-SoPEC system, or in a single-SoPEC system that has a non-USB connection to 
the host, a given SoPEC will receive its data over the ISL It is quite possible for an 
end-user to insert a chip that subverts the communications channel (for example performs 
man-in-the-middle attacks). 
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3 Proposed Solution 

A proposed solution to the requirements of Section 2, can be summarised as: 

• Each SoPEC has a unique id 

• CPU with user/supervisor mode 

• Memory Management Unit 

• Specific entry points in 0/S 

• Boot procedure, including authentication of program code and operating parameters 

• SoPEC ISI identification 



3.1 Each SoPEC has a unique id 

Each SoPEC needs to contains a unique SoPEC Jd of minimum size 64.bits^ This 
SoPEC Jd is used to form a symmetric key unique to each SoPEC: SoPEC Jd_key, 

The verification of operating parameters and ink usage depends on SoPEC Jd being diffi- 
cult to determine. Difficult to determine means that someone should not be able to deter- 
mine the id via software, or by viewing the communications between chips on the board. 
If the SoPEC Jd is available through running a test procedure on specific test pins on the 
chip, then depending on the ease by which this can be done, it is likely to be acceptable. 

It is important to note that in the proposed solution, compromise of the SoPEC Jd leads 
only to compromise of the operating parameters and ink usage on this particular SoPEC. It 
does not compromise any other SoPEC or aU inks or operating parameters in general. 

It is ideal that the SoPEC Jd be random, although this is unlikely to occur on standard 
manufacture processes for ASICs. If the id is within a small range however, it will be able 
to be broken by brute force. This is why 32-bits is not sufficient protection. 



3.2 CPU WITH USER/SUPERVISOR MODE 

SoPEC contains a CPU with direct hardware support for user and supervisor modes. At 
present, the intended CPU is ttie LEON (a 32.bit processor with an instruction set accord^ 
ing to the iEEE-1754 standard. The IEEE1754 standard is compatible with the SPARC 
V8 instruction set). 

Silverbrook (operating system) program code will run in supervisor mode, and all OEM 
prY>gram code will run in user mode. 

3.3 MEMORY MANAGEMENT UNIT 

SoPEC contains a Memory Management Unit (MMU) that limits access to regions of 
DRAM by defining read, write and execute access peimissions for supervisor and user 
mode. Program code running in user mode is subject to user mode permission settings, 
and program code running in supervisor mode is subject to supervisor mode settings. 



1 . On IBM's CUl I process this chipid is 80 bits. 
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A setting of 1 for a permission bit means that type of access (e.g. read, write, execute) is 
permitted. A setting of 0 for a read permission bit means that that type of access is not per- 
mitted. 

At reset and whenever SoPEC wakes up, the settings for all the permission bits are 1 for 
all supervisor mode accesses, and 0 for all user mode accesses. This means that supervisor 
mode program code must explicitly set user mode access to be permitted on a section of 
DRAM 

Access pemnission to all the non- valid address space should be trapped, regardless of user 
or supervisor mode, and regardless of the access being read, execute, or write. 

Access permission to all of the valid non-DRAM address space (for example the PEP 
blocks) is siqiervisor read / write access only (no supervisor execute access, and user mode 
has no acccess at all) with the exception that certain GPIO and Timer registers can also be 
accessed by user code. These registers will require bitwise access permissions. Each 
peripheral block will determine how the access is restricted 

The embedded DRAM should start at OxOOOO.OOOO to support programmable exception 
vectors. The reset exception vector (and possibly some others) is translated in the MMU to 
point to the appropriate location in ROM, ideally in a maimer that still allows null pointer 
dereferencing to be trapped. 

With respect to the DRAM and PEP subsystems of SoPEC, typically we would set user 
read/write/exeaite mode permissions to be 1/1/0 only in the region of memory that is used 
for OEM program data, 1/0/1 for regions of OEM program code, and 0/0/0 elsewhere. By 
contrast we would typically set supervisor mode read/write/execute permissions for this 
memory to be 1/1/0 (to avoid accidentally executing user code in supervisor mode). 

The SoPEC Jtd parameter (see Section 3.1) should only be accessible in supervisor mode, 
and should only be stored and manipulated in a region of memory that has no user mode 
access. 

3.4 Specific entry points in O/S 

Given that user mode program code cannot even call functions in supervisor code space, 
the question arises as how OEM programs can access functions, or request services. The 
implementation for this depends on the CPU. 

On the LEON processor, the TRAP instruction allows programs to switch between user 
and supervisor mode in a controlled way. The TRAP switches between user and supervi- 
sor register sets, and calls a specific entry point in the supervisor code space in supervisor 
mode. The TRAP handler dispatches the service request, and then returns to the caller in 
user mode. 

Use of a command dispatcher allows the O/S to provide services that filter access • e.g. a 
generalised print function will set PEP registers appropriately and ensure QA Chip ink 
updates occur. 

The LEON also allows supervisor mode code to call user mode code. There are a number 
of ways that this functionality can be implemented 
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3.5 



Boot Procedure 



3.5.1 



Basic premise 



The intention is to load the Silverbrook and OEM program code down into SoPEC*s 
RAM, where it can be subsequently executed. The basic SoPEC therefore, must be capa- 
ble of downloading program code. However SoPEC must be able to guarantee that only 
authorized Silverbrook boot programs can be downloaded, otherwise anyone could mod- 
ify the O/S to do anything, and then download that - thereby bypassing the licensed oper- 
ating parameters. 

We perform authentication of program code and data using asymmetric cryptography and 
without using a QA Chip. 

Assuming we have already downloaded some data and a 160-bit signature into eDRAM, 
the boot loader needs to perform the following tasks: 

• perform SHA-1 on the downloaded data to calculate a digest localDigest 

• perform asymmetric decryption on the downloaded signature (160-bits) using an 
asymmetric public key to obtain autkorizedDigest 

• If localDigest - autkorizedDigest^ then the downloaded data is authorized (the signa- 
ture must have been signed with the asymmetric private key) and control can then be 
passed to the downloaded data 

Asymmetric decryption is used instead of symmetric decryption because the decrypting 
key must be held in SoPEC 's ROM. If symmetric private keys are used, the ROM can be 
probed and the security is compromised. 

The procedure requires the following data item: 

• bootOkey = an /i-bit asymmetric public key 

The procedure also requires the following two functions: 

• $HA-1>» a function that performs SHA-1 on a range of memory and returns a 160-bit 
digest 

• decrypt « a function that performs asynunetric decryption of a message using the 
passed-in key 

Assuming that all of these are available (e.g. in the boot ROM), boot loader 0 can be 
defined as in the following pseudocode: 



localDigest SKA-lCdata) 

author! zedDlge St decrypt (sig, bootOkey) 

if (localDigest = author! zedDigest) 

j\xinp to program code at data -start address// will never to return 
Else 

// program code is unauthorized 
Endlf 



The length of the key will depend on the asymmetric algorithm chosen. The key must pro- 
vide the equivalent protection of the entire QA Chip system - if the Silveri>rook O/S pro- 
gram code can be bypassed, then it is equivalent to the QA Chip keys being compromised. 
In fact it is worse because it would compromise Silverbrook operating parameters, OEM 
operating parameters^ and ink authentication by software downloaded off the net (e.g. 
from some hacker in Norway). 
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In the case of RSA, a 2048-bit key is required to match the 160-bit symmetric-key security 
of the QA Chip. In the case of ECDS A, a key length of 1 32 bits is likely to suffice. 

There is also no advantage to storing multiple keys in SoPEC and having the external 
message choose which key to validate against, because a compromise of any key allows 
the external user to always select that key. Even if there were 2 keys, one for USB-based 
booting and one for ISI-based booting, a compromise of the USB-based booting key is 
enough to compromise all the single SoPEC systems (99% of SoPEC systems). 

There/ore the entire security of SoPEC is based on keeping the asymmetric private key 
paired to bootOkey secure. The entire security of SoPEC is also based on keeping the pro- 
grant that signs (i,e, authorizes) datasets using the asymmetric private key paired to 
bootOkey secure. 

If a compromise is discovered, it may be economically feasible to change the bootOkey 
value in SoPEC*s ROM, since this is only a single mask change, and would be easy to ver- 
ify and characterize. 



Given that test programs, evaluation programs, and Silverbrook O/S code needs to be 
written and tested, and OEM program code etc. also needs to be tested, it is not secure to 
have a single authentication of a monolithic dataset combining Silverbrook O/S, non-O/S, 
and OEM program code - we certainly don't want OEM's signing Silverbrook program 
code, and Silverbrook shouldn't have to be involved with the signing of OEM program 
code. 

Therefore we require differing levels of authentication and '^erefore a number of keys, 
although the procedure for authentication is identical to the first - a section of program 
code contains the key for authenticating the next. 

This method allows for any hierarchy of authex^carion, based on a root key of bootOkey. 
For example, assume that we have the following entities: 

• SoPECCo, Silverbrook's SoPEC hardware / software company. Supplies SoPEC 
ASICs and SoPEC O/S printing sofbvare to a ComCo. 

• ComCo, a company that assembles Print Engines from SoPECs, Memjct printheads 
etc, customizing the Print Engine for a given OEM according to a license 

• OEM, a company that uses a Print Engine to create a printer product to sell to the 
end-users. The OEM would supply the motor control logic, user interface, and casing. 

The levels of authentication hierarchy are as follows: 

• SoPECCo generates dataset J, consisting of the print engine O/S (which incorporates 
the print engine functionality) and the ComCo's asymmetric public key. SoPECCo 
signs datasetO with SoPECCo's asymmetric private bootOkey key. The print engine 
program code expects to see an operating parameter block signed by the ComCo's 
asymmetric private key. 

• The ComCo generates dataSetS^ consisting of datasetl plus dataset2, where dataset2 
is an operating parameter block for a given OEM's print engine licence (according to 
the print engine license arrangement) signed with the ComCo's asymmetric private 
key. The operating parameter block (datasetl) would contain valid print speed ranges, 
a PrintEngineLicenseld, and the OEM*s asymmetric public key. The ComCo can gen- 



3.5.2 



Hierarchies of authentication 
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erate as many of these operating parameter blocks for any number of Print Engine 
Licenses, but cannot write or sign any supervisor O/S program code. 
• Thfe OEM would generate datasetS, consisting of datasetS plus dataset4, where 
dataset4 is the OEM program code signed with the OEM's asymmetric private key. 
The OEM can produce as many versions of datasetS as it likes (e.g. for testing pur- 
poses or for updates to drivers etc) 

The relationship is shown below in Figure L 




datasetS 
(suppled to 
end-user) 



Figure 1. Relationship between the datasets 

When the end-user uses datasetS^ SoPEC itself validates datasetl via the bootOkey mech- 
anism described in Section 3.5.1. Once datasetl is executing, it validates datasetl, and 
uses datasetl data to validate dataset4. The validation hierarchy is shown in Figure 2. 



SoPEC boot rom 
Ondudes bootOkey public key) I 



vaitdatk>n via bootOkey 



( datasetl: operating systom 

I (includes ComCo public key) 



validation via ComCo key 



dataset2: operating parms 
Ondudes OEM pubOc key) 



validation via OEM key 



datasat4: OEM program code 



Figure 2. Validation hierarchy 
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If a key is compromised, it compromises all subsequent authorizations down the hierarchy. 
In the example from above (and as illustrated in Figure 2) if the OEM's asymmetric pri- 
vate key is compromised, then O/S program code is not compromised since it is above 
OEM program code in the authentication hierarchy. However if the ComCo's asymmetric 
private key is compromised, then the OEM program code is also compromised. A com- 
promise of bootOkey compromises everything up to SoPEC itself, and would require a 
mask ROM change in SoPEC to fix. 

// is worthwhile repeating that in any hierarchy the seairity of the entire hierarchy is 
based on keeping the asymmetric private key paired to bootOkey secure, ft is also a 
requirement that the progratn that signs (Le, authorizes) datasets using the asymmetric 
private key paired to bootOkey secure, 

3.5.3 Authenticating operating parameters 

Operating parameters need to be considered in terms of Silverbrook operating parameters 
and OEM operating parameters. Both sets of operating parameters are stored on the 
PRINTER^QA chip (physically located inside the printer). This allows the printer to 
maintain parameters regardless of being moved to different computers, or a loss/replace- 
ment of host O/S drivers etc. 

On PRINTER^QA, memory vector M^ contains the upgradable operating parameters, and 
memory vectors M|.|. contains any constant (non-upgradable) operating parameters. 

Considering only Silverbrook operating parameters for the moment, there are actually two 
problems: 

a. setting and storing the Silverbrook operating parameters, which should be 
authorized only by Silverbrook 

b. reading the parameters into SoPEC, which is an issue of SoPEC authenticating 
the data on the PRINTER^QA chip since we don't trust PRINTER^QA. 

The PRINTER^QA chip therefore contains the following symmetric keys: 

• Ko = PrintEngineLicenseJcey. This key is constant for all SoPECs supplied for a given 
print engine license agreement between an OEM and a Silverbrook ComCo. Ko has 
write permissions to the Silverbrook upgradeable region of Mo on PRINTER_QA. 

• Ki - SoPECjd^key, This key is unique for each SoPEC (see Section 3,1), and is 
known only to the SoPEC and PRINTER^QA. Kj docs not have write permissions for 
anything. 

Ko is used to solve problem (a). It is only used to authenticate the actual upgrades of the 
operating parameters. Upgrades are performed using the standard upgrade protocol 
described in [5], with PRINTER_QA acting as the ChipU, and the external upgrader act- 
ing as the ChipS. 

Ki is used by SoPEC to solve problem (b). It is used to authenticate reads of data (i.e. the 
operating parameters) from PRINTER^QA. The procedure follows the standard authenti- 
cated read protocol described in [5], with PRINTER_QA acting as ChipR, and the embed- 
ded supervisor software on SoPEC acting as ChipT. The authenticated read protocol [5] 
requires the use of a 160-bit nonce, which is a pseudo-random number. This creates the 
problem of introducing pseudo-randonmess into SoPEC that is not readily determinable 
by OEM programs, especially given that SoPEC boots into a known state. One possibility 
is to use the same random number generator as in the QA Chip (a 160-bit maxi- 
mal-lengthed linear feedback shift register) with the seed taken from the value in the 
WatchDogTlmer register in SoPECs timer unit when the first page arrives. 
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Note that the procedure for verifying reads of data from PRINTEK-QA does not rely on 
Silverbrook's key K^, This means that precisely the same mechanism can be used to read 
and authenticate the OEM data also stored in PRINTER_QA. Of course this must be done 
by Silverbrook supervisor code so that SoPEC_id_key is not revealed. 

If the OEM also requires upgradable parameters, we can add an extra key to 
PRINTER_QA, where that key is an OEM_key and has write permissions to the OEM 
part of Mq. 

In this way, Ki never needs to be known by anyone except the SoPEC and PRINTER^QA. 

Each printing SoPEC in a multi-SoPEC system need access to a PRINTER_QA chip that 
contains the appropriate SoPECJdJcey to validate ink useage and operating parameters. 
This can be accomplished by a separate PRINTER^QA for each SoPEC, or by adding 
extra keys (multiple SoPECJdJceys) to a single PRINTER.Q A. 

However, if ink usage is not being validated (e.g. if print speed were the only Silverbrook 
iQ>gradable parameter) then not all SoPECs require access to a PRrNTER_QA chip that 
contains the appropriate SoPECJdJcey. Assuming that OEM program code controls the 
physical motor speed (different motors per OEM), then the PHI within the first (or only) 
front-page SoPEC can be programmed to accept (or generate) line sync pulses at a partic- 
ular rate. If line syncs arrived faster than the particular rate, the PHI would generate a 
buffer undeuun. Tliis would mean that even if the motor speed was hacked to be fast, the 
print will tenninate. 

3.5.3. 1 OEM assembly-line test 

As described in Section 2.1.2, Silverbrook operating parameters include such items as 
print speed, print quality etc. and are tied to a license provided to an OEM. These parame* 
tcrs are under Silverbrook control. The licensed Silverbrook operating parameters are 
stored in the PRINTER_QA as described in Section 3.5.3. 

However, although an OEM should only be able sell the licensed operating parameters for 
a given Print Engine, they must be able to assembly-line test* the Print Engine with a dif- 
ferent set of operating parameters i.e. a maximally upgraded Print Engine. 

Several different mechanisms can be employed to allow OEMs to test the upgraded capa- 
bilities of the Print Engine. At present it is unclear exactly what kind of assembly-line tests 
would be performed 

At first thought, it might be considered that a dongle-style approach using a special master 
PRINTER_QA containing upgraded parameters might be a solution. However, for the 
SoPEC to accept the parameters as true, the special master PRINTER^QA must contain 
the appropriate SoPECJdJcey (tied to the specific SoPEC Jd of the SoPEC system under 
test). Since the OEM can*t perform the key upgrade, they must make use of a Silverbrook 
upgrade mechanism, which implies cither a Silverbrook box, or a connection to a Silver- 
brook machine (e,g. over a net). Neither approaches are good. 



1 . This section is referring to assembly-line testing rather than development testing. An OEM can maximally upgrade a 
given Print Engine to allow developmental testing of their own* OEM program code & mechanics. 
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If there is no special master PRINTER^QA for testing, then we must make use of special 
test programs, or storage on the PRINTER^QA, or both. The solution will depend on the 
test requirements of the OEM. 

A simple test program that allows any pages to be printed at full upgrade capability is def- 
initely not secure. If it gets out to the public, it is effectively a free upgrade. Silverbrook 
would not want the OEM to have such a program. 

Likewise, if a test program only printed pages that had been signed with some key, then 
not only does this change the timing of real pages (since the signature must be verified 
before printing) but a service must be made available to sign special test images. This may 
be possible, but it means that Silverbrook must be involved for every time a test image is 
designed. If Silverbrook gives the OEM a program that generates signatures to avoid this 
annoyance, then it is the same as giving away the ability to print at full capability. 

If the OEM requires tests that are not actually printing dots, then a test harness software 
loader that looks and behaves like a real Print Engine (including all output signals etc.) at 
full upgrade capability, except that it does not produce physical dots (i.e. at the very end of 
the pipeline, the data is masked before being sent out to the printhead). This will produce a 
timing acciuate result, and is the simplest, most effective solution. If the special driver 
gets out into the public, the user can only print blank pages. 

If the OEM requires tests that actually prints dots, there are several possibilities: 

a. A version of the O/S (signed for the OEM) that will only print special Silver- 
brook test patterns. This may be quite adequate, but it has the disadvantage that 
OEM test patterns cannot be printed. 

b. A version of the O/S that prints garbage in special places over the test image. 
Again the has the disadvantage that special OEM test patterns cannot be 
printed 

c. A version of the O/S that reads and decrements a DecrementOnly value in 
PRINTER^QA. If the value before successful decrementing is non-zero, then 
the O/S will nm at full upgrade capability until either a power-loss or a 
pre-detennined number of pages (e.g. agreed to by the OEM and Silverbrook) 
have been printed. The number to be stored in the PRINTER^QA at initial 
PRINTER«QA customization may only need to be 1 or 2. 

Of these solutions, option (c) is probably the least restrictive to the OEM while still being 
useful. If the test program gets out, then if the value in PRINTER^QA is 0 after testing, 
then there is no impact, and if the value is small, then only a small number of pages can be 
printed at full upgrade c^ability, and power must stay on while doing so. 



3.5.4 Use of a PrintEnglneLicense id 

Silverbrook O/S program code contains the OEM's asymmetric public key to ensure that 
the subsequent OEM program code is authentic - i.e. from the OEM, However given that 
SoPEC only contains a single root key, it is theoretically possible for different OEM's 
applications to be run identically physical Print Engines i.e. printer driver for OEM^ run 
on an identically physical Print Engine from OEM2. 

To guard against this, the Silverbrook O/S program code contains a 
PrintEngineLicense_id code (e.g. 16 bits) that matches the same named value stored as a 
fixed operating parameter in the PRINTER^QA (i.e. in M|+). As with all other operating 
parameters, the value of PHntEngineLicense^id would be stored in PRJNTER_QA at the 
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same time as the other various PRINTER_QA customizations arc being applied, before 
being shipped to the OEM site. 

In this way, the OEMs can be sure of differentiating themselves through software func- 
tionality. 

3.5.5 Authentication of ink 

The Silverbrook O/S must perform ink authentication [5] during prints. Ink usage authen- 
tication makes use of counters in SoPEC that keep an accurate record of the exact number 
of dots printed for each ink. 

The ink amount remaining in a given cartridge is stored in that cartridge's INK_QA chip. 
Other data stored on the INK_QA chip includes ink color, viscosity, Memjet firing pulse 
profile information, as well as licensing parameters such as OEMJd, inklype, 
InkUsageLicense.Id, etc. This infomiation is typically constant, and is therefore likely to 
be stored in Mi+ within IN1C_QA. 

Just as the Print Engine operating parameters are validated by means of PRINTER^QA, a 
given Print Engine license may only be permitted to function with specifically licensed 
ink. Therefore the software on SoPEC could contain a valid set of ink types, colors, 
OBH.Ids, InkUsageLicense.Ids etc. for subsequent matching against the data in the 
rNK_.QA. 

SoPEC must be able to authenticate reads from the INK_QA, both in terms of ink parame- 
ters as well as ink remaining. 

To authenticate ink a number of steps must be taken: 
« restrict access to dot counts 

• authenticate ink usage and ink parameters via INK_QA and PRINTER_QA 

• broadcast ink dot usage to all SoPECs in a multi-SoPEC system 

3.5.5. 1 restrict access to dot counts 

Since the dot counts are accessed via the PHI in the PEP section of SoPEC, access to these 
registers (and more generally all PEP registers) must be only available from supervisor 
mode, and not by OEM code (running in user mode). Otherwise it might be possible for 
OEM program code to clear dot counts before authentication has occurred. 

3.5.5.2 authenticate ink usage and inl< parameters via INK_QA and PRINTER^QA 

The basic problem of authentication of ink remaining and other ink data boils down to the 
problem that we don't trust INK^QA, Therefore how can a SoPEC know the initial value 
of ink (or the ink parameters), and how can a SoPEC know that after a write to the 
INK^QA, the count has been correctly decremented. 

Taking the first issue, which is determining the initial ink count or the ink parameters, we 
need a system whereby a given SoPEC can perform an authenticated read of the data in 
INK_QA. 

We caimot write the SoPECJdJcey to the INK^QA for two reasons: 

• • updating keys is not power-safe (i.e. if power is removed mid-update, the INK_QA 
could be rendered useless) 
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• the ink cartridge would then not work in another printer since the other printer would 
not know the old SoPEC Jd^My (knowledge of the old key is required in order to 
change the old key to a new one). 

The proposed solution is to let INK.Q A have two keys: 

• Ko = SupplylnkLicense^key, This key is constant for all ink cartridges for a given ink 
supply agreement between an OEM and a Silverbrook ComCo (this is not the same 
key as PrintEngineUcenseJcey which is stored as in PRINTER.QA). Ko has write 
pennissions to the ink remaining regions of Mq on INK,QA. 

• K| = UselnkLicenseJcey, This is key is constant for all ink cartridges for a given ink 
usage agreement between an OEM and a Silverbrook ComCo (this is not the same key 
as PrinsEngineLicenseJcey which is stored as in PRINTER.QA). K| has no write 
permissions to anything. 

Ko is used to authenticate the actual upgrades of the amount of ink remaining (e.g. to fill 
and refill the amount of ink). Upgrades are performed using the standard upgrade protocol 
described in [5], with INK«QA acting as the ChipU, and the external upgrader acting as 
the Chips. The fill and refill upgrader (ChipS) also needs to check the appropriate ink 
licensing parameters such as OEM Jd, InkType and InkUsageLicense.Id for validity. 

Ki is used to allow SoPEC to authenticate reads of the ink remaining and any other ink 
data. This is accomplished by having the same UselnkLicenseJxy within PRINTER_QA 
(e.g. in K2), also with no write pennissions. 

This means there are two shared keys, with PRINTER_QA sharing both, and thereby act- 
ing as a bridge between INK_Q A and SoPEC. 

• UselnkLicenseJcey is shared between INK_QA and PRINTER^Q A 

• SoPECJdJcey is shared between SoPEC and PRINTER_QA 

All SoPEC has to do is do an authenticated read [5] from INK_QA, pass the data / signa* 
ture to PRINTER^QA, let PRINTER^QA validate the data / signature, and then get 
PRINTER^QA to produce a similar signature based on the shared SoPECJdJcey. SoPEC 
can then compare PRINTER_QA's signature with its own calculated signature (i.e. imple- 
ment a Test ftinction [5] in software on the SoPEC), and if the signatures match, the data 
from INK^QA must be valid, and can therefore be trusted. 

Once the data from INK-QA is known to be trusted, the amount of ink remaining can be 
checked, and the other ink licensing parameters such as OEMJd, Inkiype, 
InkUsageLicenscJd can be checked for validity. 

The actual steps of read authentication as performed by SoPEC are: 



4— 1 // simple constants to specify which key to use when signing 
KEY2 2 

^PRINTER ^ PRINTER^OA. random O 

Runt' Mjint. SIGi,nc INK^QA. read (KEY 1. Rprimter)// read with keyl: UselnkLicense.key 
*^PEC ^ random 0 

SIGpRXNTER PRlNTER_QA.test(KEY2, RiNX' Minx* SIGj^, KEYl » Rsopec) 
SXGsorac ^ HMAC_SHA_1 (Rpriotkr I Rsopbc I ^noi) 
If ((SIGpRiOTER 1= 0) AND (SIGpRiHTER = SIGsopEx:)) 
Mjjot (data read from INK_QA) is valid 

**iNit could be ink parameters* such as InkUsageLicense_Xd» or ink remaining 
X£ (MxiK-inkReznaining = expect edinkRemaining) 

// all is ok 
Else 
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// the ink value is not what we wrote, so don't print anything anymore 
Endlf 
Else 

// the data read from ZNK^QA is not valid and cannot be trusted 
Endlf 



Strictly speaking, we don't need a nonce (RsoPEc) because (containing 

the ink remaining) should be decrementing between authentications. However we do need 
one to retrieve the initial amount of ink and the other ink parameters (at power up). This is 
why taking a random number from the WatchDogHmer at the receipt of the first page is 
acceptable. 

In summary, the SoPEC performs the non-authenticated write [5] of ink remaining to the 
INK_QA chip, and then performs an authenticated read of the data via the PRINTER_Q A 
as per the pseudocode above. If the value is authenticated, and the INK^QA ink-remain- 
ing value matches the expected value, the count was correctly decremented and the print- 
ing can continue. 

3.5.5.3 broadcast ink dot usage to all SoPECs in a multi-SoPEC system 

In a multi-SoPEC system, each SoPEC attached to a printhead (4 at most) must broadcast 
its ink usage to all the SoPECs. In this way, each SoPEC will have its own version of the 
expected ink usage. 

In the case of a man-in- the-middle attack, at worst the count in a given SoPEC is only its 
own count (i.e. all broadcasts are turned into 0 ink usage by the man-in-the-middle). 

A single SoPEC performs the i^date of ink remaining to the INK.QA chip, and then all 
SoPECs perform an authenticated read of the data via the qjpropnate PRINTER_QA (the 
PRINTER_QA that contains their matching SoPECJidJcey - remember that multiple 
SoPECjdJceys can be stored in a single PRINTER_QA). If the value is authenticated, 
and the INK.QA value matches the expected value, the count was correctly decremented 
and the printing can continue. 

If any of the broadcasts are not received, or have been tampered with, the updated ink 
counts will not match. The only case this does not cater for is if each SoPEC is tricked (via 
an ISI man-in-the-middle attack), into a total that is the same, yet not the true total. Apart 
from the fact that this is not viable for general pages, at worst this is the maximum amount 
of ink printed by a single SoPEC. We don*t care about protecting against this case. 

Since there will be at most 4 printing SoPEC, it requires at most 4 authenticated reads. 
This should be completed within 0.5 seconds - well within the 2 seconds/page print time. 

3.5.6 Example hierarchy 

The exact breakdown of hierarchy will depend on a later investigation, but for the pur- 
poses of scoping out possibilities, it is worthwhile considering an example hierarchy for 
illustrative purposes. 
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Adding an extra bootloader step to the example from Section 3,5.2» we can break up the 
contents of program space into logical sections, as shown in Table 1 . Note that the ComCo 
does not provide any program code, merely operating parameters, that is used by the O/S. 



Table 1. Sections of Program Space 



l«iSIiili^?Wi 




0 

(ROM) 


boot loader 0 
SHArl function 
asymmetric decrypt function 
bootOKey 


section 1 via bootOkey 


1 


boot loader 1 

SoPEC.OS^pubtlc.key 


section 2 via SoPEC_OS,pubUc_key 




oiArerDtooK <j/o pro^i^m coo 8 
function to generate 
SoPECJd.key from SoPEC^id 
Basic Print Engine 
ComCojpublic.key 


secUon 3 via ComCo_public_key 

sectkMi 4 via OEM_public.key (suppfied in sec- 
tion 3) 

PRINrrER.QA data. whlcl> Includes the 
PrintEngineLicense.kf. Silvert)rook operating 
parameters, and OEM operating parameters (atl 
authentk:ated via SoPEC_ki_key) 


3 


CcmCo license agreement opeiBt^ 
ing parameter ranges, including 
PrintEnglneUcense_ld (gets 
loaded into supervisor mode sec- 
tion of memory) 

OEM_pubtic^key (gets loaded into 
supervisor mode sectton of mem- 
ory) 

Any ComCo written user-mode 
program code (gets loaded into 
mode mode section of memory) 


Is used by secUon 2 to verify section 4 and 
range of parameters as found in PRINTER.QA 


4 


OEM spectfic program code 


OEM operating parameters via cails to Silver- 
brook O/S code 



The verification procedures will be required each time the CPU is woken up, since the 
RAM is not preserved. 



3.5.7 What If the CPU is not fast enough? 

In the exaxT^le of Section 3.5.6, every time the CPU is woken up to print a document it 
needs to perform: 

• SHA-l on all program code and program data 

• 4 sets of asymmetric decryption to load the program code and data 

• I HMAC-SHA 1 generation per 512-bits of Silverbrook and OEM printer and ink oper- 
ating parameters 

Although the SHA-l and HMAC process will be fast enough on the embedded CPU (the 
program code will be executing from ROM), it may be that the asymmetric decryption 
will be slow. And this becomes more likely with each extra level of authentication. If this 
is the case (as is likely), hardware acceleration is required. 

A cheap fonm of hardware acceleration takes advantage of the fact that in most cases the 
same program is loaded each time, with the first time likely to be at power-up. The hard- 
ware acceleration is simply data storage for the authorizedDigest which means that the 
boot procedure now is: 
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«lawC7n_bootlo«darO (data, sig) 

localDigest SHA-l(data) 

If (localDigest = previous lyStoredAuthorizedDigest) 

jump to program code at data-start address// will never to return 
Else 

authorizedOigest decrypt (sig, bootOkey) 

l£ (localDigest « authorizedDigest) 

previouslyStoredAuthorizedDigest 4- author! zedDigest 

jump to program code at data -start address// will never to return 

Else 

// progriun code is unauthorized 



This procedure means that a reboot of the same authorized program code will only require 
SHA-1 processing. At power-up, or if new program code is loaded (e.g. an upgrade of a 
driver over the internet), then the full authorization via asymmetric decryption takes place* 
This is because the stored digest will not match at power-up and whenever a new program 
is loaded. 

The question is how much preserved space is required. 

Each digest requires 160 bits (20 bytes), and this is constant regardless of the asymmetric 
encryption scheme or the key length. While it is possible to reduce this number of bits, 
thereby sacrificing security, the cost is small enough to warrant keeping the full digest. 

However each level of boot loader requires its own digest to be preserved. This gives a 
maximum of 20 bytes per loader. Digests for operating parameters and ink levels may also 
be preserved in the same way, although these authentications should be ^LSt enough not to 
require cached storage. 

Assuming SoPEC provides for 12 digests (to be generous), this is a total of 240 bytes. 
These 240 bytes could easily be stored as 60 x 32»bit registers, or probably more conven- 
iently as a small amount of RAM (eg 0.25 - 1 Kbyte). Providing something like 1 Kbyte of 
RAM has the advantage of allowing the CPU to store other useful data, although this is not 
a requirement. 

In general, it is useful for the boot ROM to know whether it is being started up due to 
power-on reset or activity on the USB/ISL In the former case, it can ignore the previously 
stored values (either 0 for registers or garbage for RAM). In the latter case, it can use the 
previously stored values. Even without this, a startup value of 0 (or garbage) means the 
digest won*t match and therefore the authentication will occur implictly. 



At power-up, the host can send targeted data to the USB-connected SoPEC, but can only 
send broadcasts to all of the slave SoPECs via the USB-connected SoPEC*s ISI. 

Each slave SoPEC will verify the broadcast message received over the ISI, and if it is 
valid, will execute it Several levels of authorization may occur. However, at some stage, 
this common program code (broadcast to all of the slave SoPECs and signed by the appro- 
priate asymmetric private key) will, among other things, set the slave SoPECs ISI id. If 
there is only 1 slave, the id is given, but if there is more than 1 stave, the id must be deter- 
mined in some fashion. 



Endzf 



3.6 



SoPEC ISI IDENTIFICATION 
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On a particular physical arrangement of SoPECs each slave SoPEC will have a different 
set of connections on GPIOs. For example, one SoPEC maybe in charge of motor control, 
while another may be driving the LEDs etc. The unused GPIO pins (not necessarily the 
same on each SoPEC) can be set as inputs and then tied to 0 or 1 . As long as the connec- 
tion settings are mutually exclusive, program code can determine which is which» and the 
id ^}pn>priateiy set. 

In some multi-SoPEC systems, a given SoPEC will only be attached to a single printhead 
Oeft or right). We can conveniently use the second printhead coxmection pins (temperature 
and test) to form an [SI id. 

This scheme of slave SoPEC identification does not introduce a security .breach. If an 
attacker rewires the pinouts to confuse identification, at best it will simply cause strange 
printouts (e.g. swapping of printout data) to occur, while at worst the Print Engine will 
simply not function. 

Note that some physical setting (e.g. pins) on each of the multiple SoPECs is required - the 
settings just need to be mutually exclusive. Although it is possible for all the SoPECs to 
come to a logical ISI id assignment (e.g. by using ethemet-like protocols), the IS! id needs 
to be very much a physical identity scheme. This is because these SoPECs are not simply 
logical processors - we want the correct portion of the page to be printed on the correct 
physical location^ motor controls will be physically connected to a specific physical 
SoPEC etc. 

3.7 Setting up QA Chip keys 

In use, each INK^Q A chip needs the following keys: 

• Ko = SupplylnkLicenseJcey 

• = UselnkLicense^key 

Each PRINTER^QA chip tied to a specific SoPEC requires the following keys: 

• Ko =* PrintEngineLicenseJcey 

• Kj = SoPECjdJkey 

• K2 = UselnkLicense_key 

Note that there may be more than one K| depending on the number of PRINTER_QA 
chips and SoPECs in a system. These keys need to be appropriately set up in the QA Chips 
before they will function correctly together. 

3.7.1 Original QA Chips as received by a ComCo 

When original QA Chips are shipped from QACo to a specific ComCo their keys are as 
follows: 

• Ko = QACojComCoJCeyO 

• K| = QACojComCoJCeyl 

• ¥.2^ Q^CojComCoJCeyl 

• K.^-^QACojComCo_Key3 

All 4 keys are only known to QACo. Note that these keys are different for each QA Chip. 

3.7.2 Steps at the ComCo 

The ComCo is responsible for making Print Engines out of Memjet printheads, Q A Chips, 
PECs or SoPECs, PCBs etc. 
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In addition, the ComCo must customize the INK_QA chips and PRINTER^QA chip 
on-board the print engine before shipping to the OEM. 

There are two stages: 

• replacing the keys in QA Chips with specific keys for the application (i.e. INK^QA 
and PRINTER^QA) 

• setting operating parameters as per the license with the OEM 

3.7.2.1 Replacing keys 

The ComCo is issued QID hardware [4] by QACo that allows programming of the various 
keys (except for Kj) in a given QA Chip to the final values, following the standard 
ChipF/ChipP replace key (indirect version) protocol [5]. The indirect version of the proto- 
col allows each QACojComCoJCey to be different for each SoPEC. 

In the case of programming of PRINTER^QA's K| to be SoPECJdJcey, there is the addi- 
tional step of transferring an asymmetrically encrypted SoPECJidJcey (by the public-key) 
along with the nonce (Rp) used in the replace key protocol to the device that is functioning 
as a ChipF. The ChipF must decrypt the SoPECJidJcey so it can generate the standard 
replace key message for PRINTER_Q A (functioning as a Chip? in the ChipF/ChipP pro- 
tocol). The asymmetric key pair held in the ChipF equivalent should be unique to a 
ComCo (but still known only by QACo) to prevent damage in the case of a compromise. 

Note that the various keys installed in the QA Chips (both INK_QA and PRINTER^QA) 
are only known to the QACo. .The OEM only uses QIDs and QACo supplied ChipFs. The 
replace key protocol [5] allows the programming to occur without compromising the old 
or new key. 

3. 7.2.2 SetUng operating parameters 

There are two sets of operating parameters stored in PRINTER^QA and INKLQA: 

• fixed 

• upgradable 

The fixed operating parameters can be written to by means of a non-authenticated writes 
[5] to via a QID [4], and permission bits set such that they are ReadOnly. 

The upgradable operating parameters can only be written to af^er the QA Chips have been 
programmed with the correct keys as per Section 3.7.2.1. Once they contain the correct 
keys they can be programmed with appropriate operating parameters by means of a QID 
and an appropriate ChipS (containing matching keys). 
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3 Introduction 



This document describes the SoPEC ASIC (Small office home office Print Engine Controller) suitable for 
use in price sensitive SoHo printer products. The SoPEC ASIC is intended to be a low cost solution for bi- 
lithic printhead control, replacing the multichip solutions in larger more professional systems with a single 
chip. The increased cost competitiveness is achieved by integrating several systenis such as a modified 
PEC 1 [ 1 ] printing pipeline, CPU control system, peripherals and memory sub-system onto one SoC ASIC, 
reducing component count and simplifying board design. 

This section will give a general introduction to Memjet printing systems, introduce the components that 
make a bi-Iithic printhead system, describe possible system architectures and show how several SoPECs 
can be used to achieve A3 and A4 duplex printing. The section "SoPEC ASIC" describes the SoC SoPEC 
ASIC, with subsections describing the CPU, DRAM and Print Engine Pipeline subsystems. Each section 
gives a detailed description of the blocks used and their operation within the overall print system. The final 
section describes the bi-lithic printhead construction and associated implications to the system due to its 
makeup. 

Some sections of this document were derived from the Print Engine Controller Hardware Design Specifi- 
cation[l ] written by Silverbrook Research. 
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4 Nomenclature 



4.1 Bl-LITHIC PRINTHEAD NOTATION 

A bi-lithic based printhead is constructed from 2 piinthead ICs of varying sizes. The notation M:N is used 
to express the size relationship of each IC, where M specifies one printhead IC in inches and N specifies 
the remaining printhead IC in inches. 

Section 35 Memjet Printhead contains a description of the bi-lithic printhead and related terminology. 



4.2 Definitions 

The following terms 
Bi-lithic printhead 
CPU 

ISI-Bridge chip 

ISIMaster 

ISISlave 

LEON 

LineSyncMaster 

Multi-SoPEC 

Netpage 

PECl 

Printhead IC 
PrintMaster 

QAChip 
Storage SoPEC 
Tag 



are used throughout this specification: 
Refers to printhead constructed from 2 printhead ICs 
Refers to CPU core, caching system and MMU. 

A device with a high speed interface (such as USB2.0, Ethernet or IEEE1394) and 
one or more ISI interfaces. The ISI-Bridge would be the ISIMaster for each of the 
ISI buses it interfaces to. 

The ISIMaster is the only device allowed to initiate communication on the Inter 
Sopec Interface (ISI) bus. The ISIMaster interfaces directly with the host. 

Multi-SoPEC systems will contain one or more ISISlave SoPECs connected to the 
ISI bxis. ISISlaves can only respond to communication initiated by the ISIMaster. 

Refers to the LEON CPU core. 

The LineSyncMaster device generates the line synchronisation pulse that all 
SoPECs in the system must synchronise their line outputs to. 

Refers to SoPEC based print system with multiple SoPEC devices 

Refers to page printed with tags (normally in infrared ink). 

Refers to Print Engine Controller version 1, precursor to SoPEC used to control 
printheads constructed from multiple angled printhead segments. 

Single MEMS IC iised to construct bi-lithic printhead 

The PrintMaster device is responsible for coordinating all aspects of the print 
operation. There may only be one PrintMaster in a system. 

Quality Assurance Chip 

An ISISlave SoPEC used as a DRAM store and which does not print. 

Refers to pattern which encodes information about its position and orientation which 
allow it to be optically located and its data contents read. 



4.3 ACRONYM AND ABBREVIATIONS 

The following acronyms and abbreviations are used in this specification 
CPU Contone FIFO Unit 

CPU Central Processing Unit 

DIU DRAM Interface Unit 
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DNC 


Dead Nozzle Compensator 


DRAM 


Dynamic Random Access Memoxy 


DWU 


DotLine Writer Unit 


GPIO 


General Purpose Input Output 


HCU 


Halfloner Compositor Unit 


ICU 


Intemipt Controller Unit 


ISI 


Inter SoPEC Interface 


LDB 


Lossless Bi>level Decoder 


LLU 


Line Loader Unit 


LSS 


Low Speed Serial interface 


MEMS 


Micro Electro Mechanical System 


MMU 


Memory Management Unit 


PCU 


SoPEC Controller Unit 


PHI 


PrintHead Interface 


PSS 


Power Save Storage Unit 


RDU 


Real-time Debug Unit 


ROM 


Read Only Memory 


SCB 


Serial Communication Block 


SFU 


Spot FIFO Unit 


SMG4 


Silverbrook Modified Group 4. 


SoPEC 


Small office home office Print Engine Controller 


SRAM 


Static Random Access Memory 


TE 


Tag Encoder 


TFU 


Tag FIFO Unit 


TIM 


Timers Unit 


USB 


Universal Serial Bus 



4.4 Pseudocode notation 

In general the pseudocode examples use C like statements with some exceptions. 

Symbol and naming convections used for pseudocode. 

// Conunent 

= Assignment 

Operator equal, not equal, less than, greater than 
Operator addition, subtraction, multiply, divide, modulus 

&,|,^i<<,»,^ Bitwise AND, bitwise OR, bitwise exclusive OR, left shift, right shift, complement 

AND,OR,NOT Logical AND, Logical OR, Logical inversion 

pCX: YY] Array/vector specifier 
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{a,b,c} 



Concatenation operation 
Increment and decrement 



4.4.1 



Register and signal naming conventions 



In general register naming uses the C style conventions with capitalization to denote word delimiters. Sig- 
nals use RTL style notation where underscore denote word delimiters. There is a direct translation between 
both convention. For example the CmdSourceFifo register is equivalent to cmd_sourcejifo signal. 



State machines should be described using the pseudocode notation outlined above. State machine descrip- 
tions use the convention of underline to indicate the cause of a transition from one state to another and 
plain text (no imderline) to indicate the effect of the transition i.e. signal transitions which occur when the 
new state is entered 

A sample state machine is shown in Figure 1. 



4.5 



State machine notation 



resat Q prst n Q 
cdu_dlu_fr©q « 0 
ignora.data s o 




dona band»«0 
cdu_<llu_rreq - 0 
tgnof a.daia = 0 



Figure 1. Example State machine notation 
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5 Printing Considerations 

A bi-lithic printhead produces 1600 dpi bi-Ievel dots. On low-diffusion paper, each ejected drop forms a 
22.5^ni diameter dot. Dots are easily produced in isolation, allowing dispersed-dot dithering to be 
ejq>loited to its fullest. Since the bi-lithic printhead is the width of the page and operates with a constant 
paper velocity, color planes are printed in perfect registration, allowing ideal dot-on-dot printing. Dot-on- 
dot printing minimizes * muddying' of midtones caused by inter-color bleed. 

A page layout may contain a mixture of images, graphics and text. Continuous-tone (contone) images and 
graphics are reproduced using a stochastic dispersed-dot dither. Unlike a clustered-dot (or amplitude-mod- 
ulated) dither, a dispersed-dot (or frequency-modulated) dither reproduces high spatial frequencies (i.e. 
image detail) almost to the limits of the dot resolution, while simultaneously reproducing lower spatial fre- 
quencies to their full color depth, when spatially integrated by the eye. A stochastic dither matrix is care- 
fully designed to be free of objectionable low- frequency patterns when tiled across the image. As such its 
size typically exceeds the minimum size required to support a particular number of intensity levels (e.g. 
16xl6x 8 bits for 257 intensity levels). 

Human contrast sensitivity peaks at a spatial frequency of about 3 cycles per degree of visual field and 
then falls ofiflogarithmictdly, decreasing by a factor of 100 beyond about 40 cycles per degree and becom- 
ing inuneasurable beyond 60 cycles per degree [21][22]. At a normal viewing distance of 1 2 inches (about 
300nun), this translates roughly to 200-300 cycles per inch (cpi) on the printed page, or 400-600 samples 
per inch according to Nyquist's theorem. 

In practice, contone resolution above about 300 ppi is of limited utility outside special applications such as 
medical imaging. Offset printing of magazines, for example, uses contone resolutions in the range 150 to 
300 ppi. Higher resolutions contribute slightly to color error through the dither. 

Black text and graphics are reproduced directly using bi-lcvel black dots, and are therefore not anti-aliased 
(i.e. low-pass filtered) before being printed. Text should therefore be supersampled beyond the perceptual 
limits discussed above, to produce smoother edges when spatially integrated by the eye. Text resolution up 
to about 1200 dpi continues to contribute to perceived text shaipness (assuming low-difiusion paper, of 
course). 

A Netpage printer, for example, may use a contone resolution of 267 ppi (i.e. 1600 dpi / 6), and a black 
text and graphics resolution of 800 dpi. A high end office or departmental printer may use a contone reso- 
lution of 320 ppi (1600 dpi / 5) and a black text and graphics resolution of 1600 dpi. Both formats are 
enable of exceeding the quality of commercial (offset) printing and photographic reproduction. 
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6 Document Data Flow 



6.1 Considerations 

Because of the page-width nature of the bi-iithic printhead, each page must be printed at a constant speed 
to avoid creating visible aiti^cts. This means that the printing speed can*t be varied to match the input 
data rate. Document rasterization and document printing are therefore decoupled to ensure the printhead 
has a constant supply of data. A page is never printed until it is fully rasterized. This can be achieved by 
storing a compressed version of each rasterized page image in memory. 

This decoupling also allows the RIP(s) to run ahead of the printer when rasterizing simple pages, buying 
time to rasterize more complex pages. 

Because contone color images are reproduced by stochastic dithering, but black text and line graphics are 
reproduced directly using dots, the compressed page image format contains a separate foreground bi-level 
black layer and background contone color layer The black layer is composited over ^e contone layer after 
the contone layer is dithered (although the contone layer has an optional black component). A final layer 
of Netpage tags (in infrared or black ink) is optionally added to the page for printout. 

Figure 2 shows the flow of a document from computer system to printed page. 




=□ 



Figure 2. Document data flow 
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At 267 ppi for example, a A4 page (8.26 inches x 11.7 inches) of contone CMYK data has a size of 
26.3MB. At 320 ppi, an A4 page of contone data has a size of 37.8MB. Using lossy contone compression 
algorithms such as JPEG (23), contone images compress with a ratio up to 10:1 without noticeable loss of 
quality, giving compressed page sizes of 2.63MB at 267 ppi and 3.78 MB at 320 ppi. 

At 800 dpi, a A4 page of bi-level data has a size of 7.4MB. At 1600 dpi, a Letter page of bi-level data has 
a size of 29.5 MB, Coherent data such as text compresses very well. Using lossless bi-level compression 
I algorithms such as SMG4 fax as discussed in Section 8.1.2.3.1, ten-point plain text compresses with a 

ratio of about 50:1. Lossless bi-level compression across an average page is about 20:1 with 10:1 possible 
for pages which compress poorly. The requirement for SoPEC is to be able to print text at 10:1 compres- 
sion. Assuming 10:1 compression gives compressed page sizes of 0.74 MB at 800 dpi, and 2.95 MB at 
1600 dpi. 

Once dithered, a page of CMYK contone image data consists of 1 16MB of bi-level data. Using lossless bi- 
level compression algorithms on this data is pointless precisely because the optimal dither is stochastic - 
i.e. since it introduces hard-to-compress disorder. 

Netpage tag data is optionally supphed with the page image. Rather than storing a compressed bi-level 
data layer for the Netpage tags, the tag data is stored in its raw form. Each tag is supplied to 120 bits of 
raw variable data (combined with up to 56 bits of raw fbced data) and covers up to a 6mm x 6mm area (at 
1600 dpi). The absolute maximxmi number of tags on a A4 page is 15,540 when the tag is only 2mm x 
2mm (each tag is 126 dots x 126 dots, for a total coverage of 148 tags x 105 tags). 15,540 tags of 128 bits 
per tag gives a compressed tag page size of 0.24 MB. 

The multi-layer compressed page image format therefore exploits the relative strengths of lossy JPEG con- 
tone image compression, lossless bi-level text compression, and tag encoding. The format is compact 
enough to be storage-efficient, and simple enough to allow stzaightforward real-time expansion during 
printing. 

Since text and images normally don't ovcriap, the normal worst-case page image size is image only, while 
the normal best-case page image size is text only. The addition of worst case Netpage tags adds 0.24MB to 
the page image size. The worst-case page image size is text over image plus tags. The average page size 
assumes a quarter of an average page contains images. Table 1 shows data sizes for compressed Letter 
page for these dififerent options. 



Table 1. Data sizes for A4 page (8.26 fnches x 11.7 Inches) 





i 






Image only (contone), 10:1 compression 


2.63 MB 


3.76 MB 


Text only (l^i-level). 10:1 compression 


0.74 MB 


2.95 MB 


Netpage tags. 1600 dpi 


0.24 MB 


0.24 MB 


Worst case (text + image ^ tags) 


3.61 MB 


6.67 MB 


Average (text -i- 25% image 4- tags) 


1.64 MB 


4.25 MB 



6.2 Document Data Flow 

The Host PC rasterizes and compresses the incoming document on a page by page basis. The page is 
restructured into bands with one or more bands used to construct a page. The compressed data is then 
transferred to the SoPEC device via the USB link. A complete band is stored in SoPEC embedded mem- 
ory. Once the band transfer is complete the SoPEC device reads the compressed data, expands the band, 
normalizes contone, bi-level and tag data to 1600 dpi and transfers the resultant calculated dots to the bi- 
lithic printhead. 

The document data flow is 
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• The RIP software rasterizes each page description and compress the rasterized page image. 

• The infrared layer of the printed page optionally contains encoded Netpage [5] tags at a prograimnahle 
density. 

• The compressed page image is transferred to the SoPEC device via the USB normally on a band by 
band basis. 

• The print engine takes the compressed page image and starts the page expansion. 

• The first stage page expansion consists of 3 operations performed in parallel 

• expansion of the JPEG-compressed contone layer 

• expansion of the SMG4 fax compressed bi-level layer 

• encoding and rendering of the bi-level tag data. 

• The second stage dithers the contone layer using a programmable dither matrix, producing up to four 
bi-level layers at full-resolution. 

• The second stage then composites the bi-level tag data layer, the bi*level SMG4 fax decompressed 
layer and up to four bi-level JPEG de-compressed layers into the full-resolution page image. 

• A fixative layer is also generated as required. 

• The last stage formats and prints the bi-level data through the bi-lithic printhead via the printhead inter- 
face. 

The SoPEC device can print a flill resolution page with 6 color planes. Each of the color planes can be 
generated from compressed data through any channel (either JPEG compressed, bi-level SMG4 fax com- 
pressed, tag data generated, or fixative channel created) with a maximum number of 6 data channels from 
page RJP to bi-lithic printhead color planes. 

The mapping of data channels to color planes is programmable, this allows for multiple color planes in the 
printhead to map to the same data channel to provide for redundancy in the printhead to assist dead nozzle 
compensation. 

Also a data channel could be used to gate data from another data channel. For example in stencil mode, 
data from the bilevel data channel at 1600 dpi can be used to filter the contone data channel at 320 dpi, giv- 
ing the effect of 1600 dpi contone image. 

6.3 Page considerations due to SoPEC 

The SoPEC device typically stores a complete page of document data on chip. The amount of storage 
available for compressed pages is limited to 2Mbytcs, imposing a fixed maximum on compressed page 
size, A comparison of the compressed image sizes in Table 1 indicates that SoPEC would not be capable 
of printing worst case pages unless they are split into bands and printing commences before all the bands 
for the page have been downloaded The page sizes in the table are shown for comparison purposes and 
would be considered reasonable for a professional level printing system. The SoPEC device is aimed at the 
consumer level and would not be required to print pages of that complexity. Target document types for the 
SoPEC device are shown Table 2. 



Table 2. Page content targets for SoPEC 











Best Case picture Image, 267ppi with 3 colors, A4 size 


8.26x11.7x267x267x3 ©10:1 


1,97 


Full page text, 800dpi A4 size 


8.26x11,7x800x800 d 10:1 


0.74 
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Table 2. Page content targets for SoPEC 







Mixed Graphics and Text 

• Image of 6 inches x 4 Inches d 267 ppi and 3 colors 
- Remaining area text -73 inches^. 800 dpi 


6x4x267x267x3 9 5:1 
800x800x73 ® 10:1 


1.55 


Best Case Photo, 3 Colors, 6.6 Megapixel Image 


6.6 Mpixel Q 10:1 


2.00 



If a document with more complex pages is required, the page RIP software in the host PC can determine 
that there is insufficient memory storage in the SoPEC for that document In such cases the RIP software 
can take two courses of action. It can increase the compression ratio until the compressed page size will fit 
in the SoPEC device, at the expense of document quality, or divide the page into bands and allow SoPEC 
to begin printing a page band before all bands for that page are downloaded. Once SoPEC starts printing a 
page it cannot stop, if SoPEC consimies compressed data faster than the bands can be downloaded a buffer 
tmdeniin error could occur causing the print to fail. A buffer underrun occurs if line synchronisation pulse 
is received before a line of data has been transferred to the printhead. 

Other options which can be considered if the page does not fit completely into the compressed page store 
are to slow the printing or to use multiple SoPECs to print parts of the page. A Storage SoPEC (Section 
7.2.5) could be added to the system to provide guaranteed bandwidth data delivery. The print system could 
also be constructed using an ISI-Bridge chip (Section 7.2.6) to provide guaranteed data delivery. 
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7 Memjet Printer Architecture 

The SoPEC device can be used in several printer configurations and architectures. 

In the general sense every SoPEC based printer architecture will contain: 

• One or more SoPEC devices. 

• One or more bi-lithic printheads. 

• Two or more LSS busses. 

• Two or more QA chips. 

• USB 1 . 1 connection to host or ISI connection to Bridge Chip. 

• ISI bus connection between SoPECs (when multiple SoPECs are used). 

Some example printer configurations as outlined in Section 7.2. The various system components are out- 
lined briefly in Section 7.1. 

7,1 System Components 

7.1.1 SoPEC Print Engine Controller 

The SoPEC device contains several system on a chip (SoC) components, as well as the print engine pipe- 
line control application specific logic. 

7.1. 1.1 Print Engine Pipeline (PEP) Logic 

The PEP reads compressed page store data from the embedded memory, optionally decompresses the data 
and formats it for sending to the printhead. The print engine pipeline functionality includes expanding the 
page image, dithering the contone layer, compositing the black layer over the contone layer, rendering of 
Netpage tags, compensation for dead nozzles in the printhead, and sending the resultant image to the bi- 
lithic printhead. 

7.1.1.2 Embedded CPU 

SoPEC contains an embedded CPU for general purpose system configuration and management. The CPU 
performs page and band header processing, motor control and sensor monitoring (via the GPIO) and other 
system control functions. The CPU can perform buffer management or report buffer status to the host. The 
CPU can optionally run vendor application specific code for general print control such as paper ready 
monitoring and LED status update. 

7. 1. 1.3 Embedded Memory Buffer 

A 2.5Mbyte embedded memory buffer is integrated onto the SoPEC device, of which approximately 
2Mbytes are available for compressed page store data. A compressed page is divided into one or more 
bands, with a number of bands stored in memory. As a band of the page is consumed by the PEP for print- 
ing a new band can be downloaded. The new band may be for the current page or the next page. 

Using banding it is possible to begin printing a page before the complete compressed page is downloaded, 
but care must be taken to ensure that data is always available for printing or a buffer underrun may occur. 

An Storage SoPEC acting as a memory buffer (Section 7.2.5) or an ISI-Bridge chip with attached DRAM 
(Section 7.2.6) could be used to provide guaranteed data delivery. 
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7. i. i.4 Embedded USB 1. 1 Device 



The embedded USB l.l device accepts compressed page data and control commands from the host PC. 
and facilitates the data transfer to either embedded memoiy or to another SoPEC device in multi-SoPEC 
systems. 



7.1.2 Bi-lithic Prinlhead 



The printhead is constructed by abutting 2 printhead ICs together. The printhead ICs can vary in size from 
2 inches to 8 inches, so to produce an A4 printhead several combinations are possible. For example two 
printhead ICs of 7 inches and 3 inches could be used to create a A4 printhead (the notation is 7:3). Simi- 
larly 6 and 4 combination (6:4), or 5:5 combination. For an A3 printhead it can be constructed from 8:6 or 
an 7:7 printhead IC combination. For photographic printing smaller printheads can be constructed. 



7.1.3 LSS interface bus 



Each SoPEC device has 2 LSS system buses for communication with QA devices for system authentica- 
tion and ink ixsage accounting. The number of Q A devices per bus and their position in the system is unre- 
stricted with the exception that PRINTERjQA and INK_QA devices should be on separate LSS busses. 



7.1.4 QA devices 



Each SoPEC system can have several QA devices. Normally each printing SoPEC will have an associated 
PRINTERJQA. Ink cartridges will contain an INKjQA chip. PRINTER _QA and INKjQA devices should 
be on separate LSS busses. Ail QA chips in the system are physically identical with flash memory contents 
defining PRINTERJQA from INKjQA chip. 



7.1.5 ISI Interfece 



The Inter-SoPEC Interface (ISI) provides a communication channel between SoPECs in a multi-SoPEC 
system. The ISIMaster can be SoPEC device or an ISI-Bridge chip depending on the printer configuration. 
Both compressed data and control commands are transferred via Uie interface. 



7.1.6 ISI-Bridge Chip 



A device, other than a SoPEC with a USB connection, which provides print data to a number of slave 
SoPECs. A bridge chip will typically have a high bandwidth connection, such as USB2.0, Ethernet or 
IEEE1394, to a host and may have an attached external DRAM for compressed page storage. A bridge 
chip would have one or more ISI interfaces. The use of multiple ISI buses would allow the construction of 
independent print systems within the one printer. The ISI-Bridge would be the ISIMaster for each of the 
ISI buses it interfaces to. 



7.2 Possible SoPEC Systems 



Several possible SoPEC based system architectures exist. The following sections outline some possible 
architectures. It is possible to have extra SoPEC devices in the system xised for DRAM storage. The QA 
chip configurations shown are indicative of the flexibility of LSS bus architecture, but not limited to those 
configurations. 
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7.2.1 A4 Simplex with 1 SoPEC device 



USB from Host i 




prtnthead assembly 

— — — — » — — — — — — — — — — — — _ _ — 

Figure 3. Single SoPEC A4 Simplex system 



In Figure 3, a single SoPEC device can be used to control two printhead ICs. The SoPEC receives com- 
pressed data through the USB device from the host. The compressed data is processed and transferred to 
the printhead. 



7.2.2 A4 Duplex with 2 SoPEC devices 



USB from Host 



pHnthMd assembly 




high speed 
low speed 



Figure 4. Dual SoPEC A4 Duplex system 

In Figure 4, two SoPEC devices are used to control two bi-lithic printheads, each with two printhead ICs. 
Each bi-lithic printhead prints to opposite sides of the same page to achieve duplex printing. The SoPEC 
connected to the USB is the ISIMastcr SoPEC, the remaining SoPEC is an ISISIave. The ISIMaster 
receives all the compressed page data for both SoPECs and re-distributes the compressed data over the 
Inter-SoPEC Interface (ISI) bus. 
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It may not be possible to print an A4 page every 2 seconds in this configuration since the USB 1.1 connec- 
tion to the host may not have enough bandwidth. An alternative would be for each SoPEC to have its own 
USB 1.1 connection. This would allow a faster average print speed. 



7.2.3 A3 Simplex with 2 SoPEC devices 



USB from Host 




highspoed 
•4-^ lowspood 



Figure 5. Dual SoPEC A3 sfmpiax system 



In Figure 5, two SoPEC devices arc used to control one A3 bi-lithic printhead. Each SoPEC controls only 
one printhead IC (the remaining PHI port typically remains idle). The USB 1.1 connection defines the ISI- 
Master SoPEC. In this dual SoPEC configuration the compressed page store data is split across 2 SoPECs 
giving a total of 4Mbyte page store, this allows the system to use compression rates as in an A4 architec- 
ture, but with the increased page size of A3. The ISIMaster receives all the compressed page data for all 
SoPECs and re-distributes the compressed data over the Inter-SoPEC interface (ISI) bus. 

It may not be possible to print an A3 page every 2 seconds in this configuration since the USB 1.1 connec- 
tion to the host will only have enough bandwidth to supply 2Mbytes every 2 seconds. Pages which require 
more than 2MBytes every 2 seconds will therefore print slower. An alternative would be for each SoPEC 
to have its own USB 1 . 1 connection. This would allow a faster average print speed. 
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7.2.4 A3 Duplex with 4 SoPEC devices 



I" " ropiaciabl» repl«c*able * "J 
I Inkcattitdgft rt »nk cartridge , 




J. i»rinthaad£S£efiil^y 

Figure 6. Quad SoPEC A3 duplex system 



In Figure 6 a 4 SoPEC system is shown. It contains 2 A3 bi-Iithic printheads, one for each side of an A3 
page. Each printhead contain 2 printhead ICs, each printhead IC is controlled by an independent SoPEC 
device, widi the remaining PHI poit typically unused. Again the USB 1 . 1 connection defines the ISIMaster 
wiA the other SoPECs as ISISlaves. In total, the system contains SMbytes of compressed page store 
(2Mbytes per SoPEC), so the increased page size does not degrade the system print quality, from that of an 
A4 simplex printer. The ISIMaster receives all the compressed page data for all SoPECs and re-distributes 
the con^>ressed data over the Inter-SoPEC Interface (ISI) bus. 

It may not be possible to print an A3 page every 2 seconds in this configuration since the USB l.l connec- 
tion to the host will only have enough bandwidth to supply 2Mbytes every 2 seconds. Pages which require 
more than 2MBytes every 2 seconds will therefore print slower. An alternative would be for each SoPEC 
or set of SoPECS on the same side of the page to have their own USB 1.1 connection. This would allow a 
faster average print speed. 
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7.2.5 SoPEC DRAM storage solution: A4 Simplex with 1 printing SoPEC and 1 memory SoPEC 



USB from Host i 




^ hlohtp««d 
lowspeod 



I prlnthead assembly 
— — J 

Figure 7. SoPEC A4 Simplex system with extra SoPEC used as DRAM storage 



Extra SoPECs can be used for DRAM storage e.g. in Figure 7 an A4 simplex printer can be built with a 
single extra SoPEC used for DRAM storage. The DRAM SoPEC can provide guaranteed bandwidth deliv- 
ery of data to the printing SoPEC. SoPEC configurations can have multiple extra SoPECs used for DRAM 
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7.2.6 ISI-Bridge chip solution: A3 Duplex system with 4 SoPEC devices 



r — — — — — —— 1 

I replacMbIa i 
I Ink cartridgo i 




Figure 8. A3 duplex system featuring four printing SoPECs 



In Figure 8, an ISI-Bridge chip provides slave-only ISI connections to SoPEC devices. Figure 8 shows a 
ISI*Bridge chip with 2 separate ISI ports. The ISI-Bridge chip is the ISIMaster on each of the ISI busses it 
is connected to. All connected SoPECs are ISlSlaves. The ISI-Bridge chip will typically have a high band- 
width connection to a host and may have an attached external DRAM for compressed page storage. 

An alternative to having a ISI-Bridge chip would be for each SoPEC or each set of SoPECs on the same 
side of a page to have their own USB 1.1 connection. This would allow a faster average print speed. 
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8 Page Format and Printflow 



When rendering a page, the RIP produces a page header and a number of bands (a non>blank page requires 
at least one band) for a page. The page header contains high level rendering parameters, and each band 
contains compressed page data. The size of the band will depend on the memory available to the RIP, the 
speed of the RIP, and the amount of memory remaining in SoPEC while printing the previous band(s). Fig- 
ure 9 shows the high level data structure of a number of pages with different numbers of bands in the page. 



blank page 



singre band page 



2 band page 



mutti band page 



page header 



bandO 



page header 



bandO 



band 1 



page header 



foando 



bandl 



bandn 



Figure 9. Pages containing different numbers of bands 

Each compressed band contains a mandatory band header, an optional bi-level plane, optional sets of inter- 
leaved contone planes, and an optional tag data plane (for Netpage enabled ^plications). Since each of 
these planes is optional^ the band header specifies which planes are included with the band. Figure 10 
gives a high-level breakdown of the contents of a page band. 



band n 




band header 



bMevelptarw 



oontone plane 



tag data plane 



Figure 10. Contents of a page band 



A single SoPEC has maximum rendering restrictions as follows: 

• 1 bi-lcvel plane 

• 1 contone interleaved plane set containing a maximum of 4 contone planes 

• 1 tag data plane 

• a bi-lithic printhead with a maximum of 2 printhead ICs 
The requirement for single-sided A4 single SoPEC printing is 

• average contone JPEG compression ratio of 10: 1, with a local minimum compression ratio of 5:1 for a 
single line of interleaved JPEG bloclcs. 



1 . Although a band must contain at least one plane 
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• average bi-level compression ratio of 10: 1 , with a local minimum compression ratio of 1 :1 for a single 
line. 

If the page contains rendering parameters that exceed these specifications, then the RIP or the Host PC 
must split the page into a format that can be handled by a single SoPEC. 

In the general case, the SoPEC CPU must analyze the page and band headers and generate an appropriate 
set of register write commands to configure the units in SoPEC for that page. The various bands are passed 
to the destination SoPEC{s) to locations in DRAM determined by the host. 

The host keeps a memory map for the DRAM, and ensures that as a band is passed to a SoPEC, it is stored 
in a suitable free area in DRAM. Each SoPEC is connected to the ISI bus or USB bus via its Serial com- 
munication Block (SCB). The SoPEC CPU configures the SCB to allow compressed data bands to pass 
from the USB or ISI through the SCB to SoPEC DRAM. Figure 1 1 shows an example data flow for a page 
destined to be printed by a single SoPEC. Band usage information is generated by the individual SoPECs 
and passed back to the host. 



Host RIP 



pag«/band header 



bi-level plane 



oontona IntBrteaved 
plane 



tag data plana 



SCB So PEC's ORAM 

I passed through I ^ 



passed through 

I passed through I 



I I 

I passed through | ^ 



I. ^ J 



page/band header 



bi-lavel plane 



contooe tnterieaved 
ptartd 




tag data plane 



teglster commands 



CPU 

r — 1 



SoPEC's Registers 



Figure 11. Page data path from host to SoPEC 

SoPEC has an addressing mechanism that permits circular band memory allocation, thus facilitating easy 
memory management. However it is not strictly necessary that all bands be stored together. As long as the 
appropriate registers in SoPEC are set up for each band, and a given band is contiguous' , the memory can 
be allocated in any way. 



I . Contiguous allocation also includes wrapping around in SoPEC *5 band store memory. 
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8.1 Print engine example page format 

This section describes a possible foimat of compressed pages expected by the embedded CPU in SoPEC. 
The format is generated by software in the host PC and interpreted by embedded software in SoPEC. This 
section indicates the type of information in a page format structure, but implementations need not be lim- 
ited to this format. The host PC can optionally perform the majority of the header processing. 

The compressed format and the print engines are designed to allow real-time page expansion during print- 
ing, to ensure that printing is never interrupted in the middle of a page due to data undemin. 

The page format described here is for a single black bi*level layer, a contone layer, and a Netpage tag 
layer. The black bi-level layer is defined to composite over the contone layer. 

The black bi-level layer consists of a bitmap containing a 1-bit opacity for each pixel. This black layer 
matte has a resolution which is an integer or non-integer factor of the printer's dot resolution. The highest 
supported resolution is 1600 dpi, i.e. the printer*s full dot resolution. 

The contone layer, optionally passed in as YCrCb, consists of a 24-bit CMY or 32-bit CMYK color for 
each pixel. This contone image has a resolution which is an integer or non-integer factor of the printer's 
dot resolution. The requirement for a single SoPEC is to support 1 side per 2 seconds A4/Letter printing at 
a resolution of 267 ppi, i.e. one-sixth the printer's dot resolution. 

Non-integer scaling can be performed on both the contone and bi-level images. Only integer scaling can be 
perfonned on the tag data. 

The black bi-level layer and the contone layer are both in compressed form for efficient storage in the 
printer's intemal memory. 

8.1.1 Page structure 

A single SoPEC is able to print with full edge bleed for Letter and A3 via different stitch part combina- 
tions of the bi-lithic printhead. It imposes no margins and so has a printable page area which corresponds 
to the size of its paper. The target page size is constrained by the printable page area, less the explicit (tar- 
get) left and top margins specified in the page description. These relationships are illustrated below. 



1 



target page 

printabia page area 
0>hy»c8l page) 



Figure 12. Page structure 



target bottom margin 
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8.1.2 Compressed page format 

Apart from being implicitly defined in relation to the printable page area, each page description is com- 
plete and self-contained. There is no data stored separately from the page description to which the page 
description refers.^ The page description consists of a page header which describes the size and resolution 
of the page, followed by one or more page bands which describe the actual page content 

8.1, Z1 Page header 

Table 3 shows an example format of a page header. 



Table 3. Page header format 









signature 


16-bit integer 


Page header format signature. 


version 


16-t)it integer 


Page header format version number. 


structure size 


IG-bit integer 


Size of page header. 


t^and count 


15-btt integer 


Number of bands specified for this page. 


target resolution (dpi) 


16-bit integer 


Resotutfon of target page. This is always 1600 for the Memjet 

printer. 


target page width 


16-bit integer 


Width of target page, in dots. 


tarflfil CiAQ& heiohl 

m»^wi ^a^o ii^iuiii 


32-blt integer 


HAinht nf tArnAt narip in HntQ 


target left margin for black and 
contone 


le-bit Integer 


Wkith of target left margin, in dots, for bfack and contone. 


target top margin tor tslack and 
contone 


16*bit integer 


Height of target top margin, in dots, for black and contone. 


target right margin for black and 
contone 


16-blt integer 


Width of target dght margin, tn dots, for black and contone. 


target bottom margin for Uack 
and contone 


16-blt integer 


Height of target bottom margin, tn dots, for black and contone. 


target left margin for tags 


16-blt integer 


Wkfth of target left margin, in dots, for tags. 


target top margin for tags 


16-bit integer 


Height of target top margin. In dots, for tags. 


target right margin for tags 


16-blt Integer 


Width of target right margin, in dots, for tags. 


target tK)ttom margin for tags 


1 6-t)ft integer 


Height of target tx>tfeom margin, in dots, for tags. 


generate tags 


16-bit integer 


Specifies whether to generate tags for this page (0 • no, 1 • 
yes). 


fixed tag data 


128-bit integer 


This is only valid if generate tags is set 


tag vertical scale factor 


16-bH integer 


Scale factor In vertical direction from tag data resolution to tar- 
get resolirtion. Valid range = 1-511. Integer scaling only 


tag horizontal scale factor 


16-bit integer 


Scale factor in horizontal direction from tag data resolution to 
target resolution. Valid range s 1-511. Integer scaling only. 


bi-tevel layer vertical ecate factor 


16-bit integer 


Scale factor in vertfoal direction from bi-level resolution to tar- 
get resolution (must k>e 1 or greater). May be non-integer. 
Expressed as a fraction with upper d-bits the numerator and 
the fower 8 bits the denonrtinator. 



I. SoPEC relies on dither matrices and tag stnictures to have already been set up, but these are not considered to be pait of a general page 
fbimat. It is trivial to extend the page format to allow exact specification of dither matrices and tag structures. 
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Table 3. Page header fonnat 









bMevel (ayer horizontal scale teic- 
tor 


16-bIt integer 


Scale factor in horizontal direction from bi-level resolution to 
target resolutton (must be 1 or greater). May be non-integer. 
Expressed as a fraction with upper e-bits the numerator and 
the lower 8 bits the denominator. 


bHevel layer page width 


16-bit integer 


Width of iM-level layer page, in pixels. 


bHevel layer page height 


32*bit integer 


Height of bHevel layer page, In pixels. 


oontone flags 


.16 bit Integer 


Defines the color conversion that is required for the JPEQ 
data. 

Bits 2-0 specify how many contone planes there are (e.g. 3 tor 
CI^Y and 4 tor CMYK). 

Bit 3 specifies whether the first 3 color planes need to be con> 
verted back from YCrCb to CMY. Only valid if b2-0 ^ 3 or 4. 

0 - no conversion, leave JPEG colors alone 

1 - color convert 

Bits 7-4 specifies whether the YCrCb was generated directly 
from CMY, or whether it was converted to RQB first via the 
step: R = 255-C. G = 255-M, B = 255-Y Each of the color 
planes can be individually inverted. 
Bit 4: 

0 - do not invert color plane 0 

1 • Irwert odor plane 0 
Bit 5: 

0 - do not invert color plane 1 

1 - invert color plane 1 
Brt6: 

0 • do not invert color plane 2 

1 - invert color plane 2 

Bit 7: 

0 - do not Invert color plane 3 

1 - invert color plane 3 

Bit 8 specifies whether the contone data is JPEG compressed 
or non-com Dressed* 

0 < JPEG compressed 

1 • rK>n-compressed 

The remaining bits are reserved (0). 


contone vertical scale foctor 


16-btt integer 


Scale factor in vertical direction from oontone channel resolu- 
tion to target resolutton. Valid range = 1 -255. May be non-inte- 
ger. 

Expressed as a fraction with upper 8-bits the numerator and 
the lower 8 bits the dertominator. 


oontone horizontal scale factor 


16-btt integer 


Scale factor In horizontal direction from contone channel reso- 
lution to target resolution. Valid range = 1-255. May be non- 
tnteger. 

Expressed as a fraction with upper 6-bits the numerator and 
the lower 8 bits the derwminator. 


contone page width 


16-bit integer 


Width of oontone page. In oontone pixels. 


contone page height 


32-t>it integer 


Height of contone page, in contone pixels. 


reserved 


up to 128 
bytes 


Reserved and 0 pads out page header to multiple of 128 
bytes. 



The page header contains a signature and version which allow the CPU to identify the page header format 
If the signature and/or version are missing or incompatible with the CPU, then the CPU can reject the 
page. 
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The contone flags define how many contone layers are present, which typically is used for defining 
whether the contone layer is CMY or CMYK. Additionally, if the color planes are CMY, they can be 
optionally stored as YCrCb, and further optionally color space converted from CMY directly or via RGB. 
Finally the contone data is specified as being either JPEG compressed or non-compressed 

The page header defines the resolution and size of the target page. The bi-level and contone layers are 
clipped to the taiget page if necessary. This happens whenever the bi-level or contone scale factors are not 
factors of the target page width or height. 

The tacget left, top, right and bottom margins define the positioning of the target page within the printable 
page area. 

The tag parameters specify whether or not Netpage tags should be produced for this page and what orien- 
tation the tags should be produced at (landscape or portrait mode)« The fixed tag data is also provided. 

The contone, bi-level and tag layer parameters define the page size and the scale factors. 

8.12.2 Band format 

Table 4 shows the format of the page band header. 



Table 4. Band header format 



mm'fmimi 






signature . 


16-bit Integer 


Page l>ahd header format signature. 


version 


16-bit integer 


Page tiand header format version number. 


statcture size 


16-bit integer 


Size of page band header. 


bi-level layer band height 


16-bit integer 


Height of bi-level layer band. In black pixels. 


bj-level layer band data size 


32-bit integer 


Size of bi-level layer band data, in bytes. 


contone band height 


16-bit Integer 


Height of contone band, in contone pixels. 


contone band data size 


32-bit integer 


Size of contone plane band data, in bytes. 


tag band height 


16-bit integer 


Height of tag band, In dots. 


tag band data size 


32-bit integer 


Size of unencoded tag data band, in bytes. 
Can be 0 which Indicates that no tag data is 
provided. 


reserved 


up to 128 
bytes 


Reserved and 0 pads out band header to 
multiple of 128 bytes. 



The bi-level layer parameters define the height of the black band, and the size of its compressed band data. 
The variable-size black data follows the page band header. 

The contone layer parameters define the height of the contone band, and the size of its compressed page 
data. The variable-size contone data follows the black data. 

The tag band data is the set of variable tag data half-lines as required by the tag encoder. The format of the 
tag data is found in Section 26.5.2. The tag band data follows the contone data. 

Table 5 shows the format of the variable-size compressed band data which follows the page band header. 



Table 5. Page band data format 



E'mm 






blade data 


Modified G4 facsimile bltstream^ 


Compressed bi-level layer. 


contone data 


JPEG l)ytestream 


Compressed confone datalayer. 


tag data map 


Tag data array 


Tag data format See Section 26.5.2. 
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<, See section 8.1.2.3 on page 31 for note regarding the use of this standard 

The stait of each variable-stze segment of band data should be aligned to a 256-bit DRAM word boundary. 

The following sections describe the fonnat of the compressed bi-level layers and the compressed contone 
layer, section 26.5. 1 on page 365 describes the format of the tag data structures. 

8.1.2.3 BNevel data compression 

The (typically 1600 dpi) black bi-level layer is losslessly compressed using Silverbrook Modified Group 4 
(SMG4) compression which is a version of Group 4 Facsimile compression [18] without Huffman and 
with simplified run length encodings. TVpically compression ratios exceed 10:1. The encoding are listed in 
Table 6 and Table 7 



Table 6. Bl*LeveJ group 4 facsimile style compression encodings 



mmmmmm: 




mile 


1000 


Pass CofTunemd: aO to2, skip next two edges 


1 


Verttca((0): aO 4- b1 . color = Icolor 


£ 


110 


Vertical(l): aO b1 + 1 , color = Icolor 


010 


Vertlcal(-1): aO «~ b1 - 1 , color = (color 


9 3 
1 ^ 


110000 


VerticaJ(2): aO 4- b1 2. color = Icolor 


S o 


010000 


Vertical(-2): aO <- b1 - 2, color s rcolor 


itatlon 


100000 


Vertical(3): aO <— b1 3, color « (color 


000000 


VertiGat(-3): aO 4- b1 - 3. color « loolor 




<RtxRL>100 


HorlzDntal: aO aO ••• <RL> 4- <RL> 


ft 







SMG4 has a pass through mode to cope with local negative compression. Pass through mode is activated 
by a special run-length code. Pass through mode continues to either end of line or for a pre-programmed 
number of bits, whichever is shorter. The special run-length code is always executed as a run-length code» 
followed by pass through. The pass through escape code is a mediiun length nm-length with a run of less 
than or equal to 31. 



Table 7. Run length (RL) encodings 











RRRRR1 


Short Black Runlength (5 bits) 




RRRRR1 


Short White Runlength (5 bits) 




RRRRRRRRRR10 


Medium Black Runlength (10 bits) 




RRRRRRRR10 


Medium White Runlength (8 bits) 


c 
01 o 


RRRRRRRRRR10 


Medium Black Runlength with RRRRRRRRRR <= 31. 
Enter pass through 


8 to thi! 
nentatii 


RRRRRRRR10 


Medium White Runlength with RRRRRRRR <=: 31. 
Enter pass through 


ft 


RRRRRRRRRRRRRRROO 


Long Black Runlength (1 5 bits) 




RRRRRRRRRRRRRRROO 


Long White Runlength (15 bits) 
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Since the compression is a bitstream, the encodings are read right (least significant bit) to left (most signif* 
icant bit). The mn lengths given as RRRR in Table 7 are read in the same way (least significant bit at the 
right to most significant bit at the left). 

Each band of bi-level data is optionally self contained. The first line of each band therefore is based on a 
'previous* blank line or the last line of the previous band. 



The Group 3 Facsimile compression algorithm [18] losslessly compresses bi-level data for transmission 
over slow and noisy telephone lines. The bi-level data represents scanned black text and graphics on a 
white background, and the algorithm is tuned for this class of images (it is explicitly not tuned, for exam- 
ple, for halfioned bi-level images). The ID Group 3 algorithm runlength-encodes each scanline and then 
Huflman-encodes the resulting runlengths. Runlengths in the range 0 to 63 are coded with terminating 
codes. Runlengths in the range 64 to 2623 are coded with make-up codes, each representing a multiple of 
64, followed by a terminating code. Runlengths exceeding 2623 are coded with multiple make-up codes 
followed by a terminating code. The Huffman tables are fixed, but arc separately tuned for black and white 
runs (except for make-up codes above 1728, which are common). When possible, the 2D Group 3 algo- 
rithm encodes a scanline as a set of short edge deltas (0, ±1, ±2, ±3) with reference to the previous scan- 
line. The delta symbols are entropy-encoded (so that the zero delta symbol is only one bit long etc.) Edges 
within a 2D-encoded line which can't be delta-encoded are runlength-encoded, and are identified by a pre- 
fix. 1I>- and 2D-encoded lines arc marked differently. ID-encoded lines are generated at regular intervals, 
whether actually required or not, to ensure that the decoder can recover from line noise with minimal 
image degradation. 2D Group 3 achieves compression ratios of up to 6:1 [28]. 

The Groi:^) 4 Facsimile algorithm [18] losslessly compresses bi-level data for transmission ovex error-free 
commuiucations lines (i.e. the lines are truly error-free, or error-correction is done at a lower protocol 
level). The Group 4 algorithm is based on the 2D Group 3 algorithm, with the essential modification that 
since transmission is assumed to be error-free, ID-encoded lines are no longer generated at regular inter- 
vals as an aid to error-recovery. Group 4 achieves compression ratios ranging from 20:1 to 60:1 for the 
CCITT set of test images [28] . 

The design goals and performance of the Group 4 compression algorithm qualify it as a compression algo- 
rithm for the bi-level layers. However, its Huffman tables are tuned to a lower scanning resolution (100- 
400 dpi), and it encodes runlengths exceeding 2623 awkwardly. 



The contone layer. (CMYK) is cither a non-compressed bytestream or is compressed to an interleaved 
JPEG bytestream. The JPEG bytestream is complete and self-contained. It contains all data required for 
decompression, including quantization and Huffman tables. 

The contone data is optionally converted to YCiCb before being compressed (there is no specific advan- 
tage in color-space converting if not compressing). Additionally, the CMY contone pixels are optionally 
converted (on an individual basis) to RGB before color conversion using R=255-C, G=255-M, B=255-Y. 
Optional bitwise inversion of the K plane may also be performed. Note that this CMY to RGB conversion 
is not intended to be acciirate for display purposes, but rather for the purposes of later converting to 
YCrCb. The inverse transform will be applied before printing. 



The JPEG compression algorithm [23] lossily compresses a contone image at a specified quality level. It 
introduces imperceptible image degradation at compression ratios below 5:1, and negligible image degra- 
dation at compression ratios below 10:1 [29]. 



8.1.2.3.1 Group 3 and 4 facsimile compression 



B.i.2.4 Contone data compression 



8.1.2.4.1 JPEG compression 
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JPEG typically first transforms the image into a color space which separates luminance and chrominance 
into separate color channels. This allows the chrominance channels to be subsampled without appreciable 
loss because of the hiunan visual system^s relatively greater sensitivity to limiinance than chrominance. 
After this first step, each color channel is compressed separately. 

The image is divided into 8x8 pixel blocks. Each block is then transformed into the frequency domain via 
a discrete cosine transform (DCT). This transformation has the effect of concentrating image energy in rel- 
atively lower-frequency coefficients, which allows higher-frequency coefficients to be more crudely quan- 
tized. This quantization is the principal source of compression in JPEG. Further compression is achieved 
by ordering coefficients by frequency to maximize the likelihood of adjacent zero coefficients, and then 
runlength-encoding runs of zeroes. Finally, the runlengths and non-zero frequency coefficients are entropy 
coded. Decompression is the inverse process of compression. 

8.1.2.4.2 Non-compressed format 

If the contone data is non-compressed; it must be in a block-based format bytestream with the same pixel 
order as would be produced by a JPEG decoder. The bytestream therefore consists of a series of 8x8 block 
of the original image, starting with the top lei^ 8x8 block, and working horizontally across the page (as it 
will be printed) until the top rightmost 8x8 block, then the next row of 8x8 blocks (left to right) and so on 
until the lower row of 8x8 blocks (left to right). Each 8x8 block consists of 64 8-bit pixels for color plane 
0 (representing 8 rows of 8 pixels in the order top left to bottom right) followed by 64 8-bit pixels for color 
plane 1 and so on for up to a maximum of 4 color planes. 

If the original image is not a multiple of 8 pixels in X or Y, padding must be present (the extra pixel data 
will be ignored by the setting of margins). 

8.1.2.4.3 Compressed format 

If the contone data is compressed the first memory band contains JPEG headers (including tables) plus 
MCUs (minimum coded tmits). The ratio of space between the various color planes in the JPEG stream is 
1 :1 :1 :1 . No subsampling is permitted. Banding can be completely arbitrary i.e there can be multiple JPEG 
images per band or 1 JPEG image divided over multiple bands. The break between bands is only memory 
alignment based. 

8.1.2.4.4 Conversion of RGB to YCrCb (in RIP) 

YCrCb is defined as per CCIR 601-1 (20] except that Y, Cr and Cb are normalized to occupy all 256 levels 
of an 8-bit binary encoding and take account of the actual hardware implementation of the inverse trans- 
form within SoPEC. 

The exact color conversion computation is as follows: 

• Y* = (9805/32768)R + (1923 5/3 276 8)G + (3728/32768)8 

• Cr* = (16375/32768)R - (1371 6/3 2768)G - (2659/32768)8 + 128 

• Cb* = -(5529/32768)R - (10846/32768)G + (16375/32768)8 + 1 28 

Y, Cr and Cb are obtained by roimding to the nearest integer. There is no need for saturation since ranges 
of Y*, Cr* and Cb* after rounding are [0-255], [1-255] and [1-255] respectively. Note that fitU accuracy is 
possible with 24 bits. See [14] for more information. 
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SoPEC ASIC 
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9 Overview 

The Small Office Home Office Print Engine Controller (SoPEC) is a page rendering engine ASIC that 
takes compressed page images as input, and produces decompressed page images at up to 6 channels of bi- 
level dot data as output. The bi-level dot data is generated for the Memjet bi-lithic printhead. The dot gen- 
eration process takes account of printhead construction, dead nozzles, and allows for fixative generation. 

A single SoPEC can control 2 bi4ithic printheads and up to 6 color channels at 10,000 iines/sec^ equating 
to 30 pages per minute. A single SoPEC can perform full-bleed printing of A3, A4 and Letter pages. The 6 
diannels of colored ink arc the expected maximum in a consumer SOHO, or office Bi-lithic printing envi- 
roiunent: 

• CM Y, for regular color printing. 

• for black text, line graphics and gray-scale printing. 

• IR (infrared), for Nctpage-enabled [5] applications, 

• F (fixative), to enable printing at high speed. . Because the bi-lithic printer is capable of printing so fast, 
a fixative may be required to enable the ink to dry before the page touches the page already printed. 
Otherwise the pages may bleed on each other. In low speed printing enviroimients the fixative may not 
be required. 

. SoPEC is color space agnostic. Although it can accept contone data as CMYX or RGBX, where X is an 
optional 4th channel, it also can accept contone data in any print color space. Additionally, SoPEC pro- 
vides a mechanism for arbitrary mapping of input channels to output channels, including combining dots 
for ink optimization, generation of channels based on any number of other channels etc. However, inputs 
are typically CMYK for contone input, K for the bi-level input, and the optional Netpage tag dots are typ- 
ically rendered to an infira-red layer. A fixative channel is typically generated for fast printing applications. 

SoPEC is resolution agnostic. It merely provides a mapping between input resolutions and output resolu- 
tions by means of scale factors. The expected output resolution is 1600 dpi, but SoPEC actually has no* 
knowledge of the physical resolution of the Bi-lithic printhead. 

SoPEC is page-length agnostic. Successive pages are typically split into bands and downloaded into the 
page store as each band of infoxmation is consumed and becomes free. 

SoPEC provides an interface for synchronization with other SoPECs. This allows simple muid-SoPEC 
solutions for simultaneous A3/A4/Letter duplex printing. However, SoPEC is also capable of printing only 
a portion of a page image. Combining synchronization functionality with partial page ren<^ring allows 
multiple SoPECs to be readily combined for altemative printing requirements including simultaneous 
duplex printing and wide format printing. 

Table 8 lists some of the features and coiresponding benefits of SoPEC. 



Table 8. Features and Benefits of SoPEC 



liilH^Siilllll 


liiB^ifiaii^iiiijwHj 


Optimtsed print architecture in hardware 


30ppm full page photographic quality color printing 
from a desl^top PC 


0.13micron CMOS 
(>3 million transistors) 


High speed 

L.owa>st 

High functionality 



1 . 1 0,000 lines per second equates to 30 A4/Letter pages per minute at 1 600 dpi 
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Table 8. Features and Benefits of SoPEC 







9W IVIIIUUII (JOi 9«\AJIiU 




10 flOO lir^efi ner secMid at 1 600 dni 


0.5 A4/Letter pages per SoPEC chip per second 


1 chip drives up to 133,920 nozzles 


Low cost page-width printers 


1 chip drives up to 6 color planes 


99% of SoHo printers can use 1 SoPEC device 


Integrated ORAM 


No external memory required, leading to low cost 
systems 


Power saving sleep mode 


SoPEC can enter a power saving sleep mode to 
reduce power dissipation between print Jobs 


JPEG expansion 


Low bandwidth from PC 

Low memory requirements In printer 


Lossless bitplane expansion 


High resolution t^ and line art with low bandwidth 
from PC (e.g. over USB) 


Netpage tag expansion 


Generates interactive paper 


Stochastic dispersed dot dither 


Optically smooth image quality 
No moire effects 


Hardware compositor for 6 Image planes 


Pages composited In real-time 


Dead nozzle compensation 


Extends printhead life and yield 
Reduces printhead cost 


Color space agnostic 


Compatible with all inksets and image sources 
Including RGB, CMYK, spot. CIE L*a*b*. hex- 
achrome, YCrCbK. sRGB and other 


Color space conversion 


Higher quality / lower bandwidth 


Computer Interface 


USB1.1 interface to Host and tSl interface to iSl- 
Bridge chip ttiereby allov^g connection to IEEE 
1394. Bluetooth etc. 


Cascadable in resolution 


Printers of any resolution 


Cascadable In color depth 


Special color sets e.g. hexachrome can be used 


Cascadable in image size 


Printers of any width up to 16 inches 


Cascadable in pages 


Printers can print both sides simultaneously 


Cascadable in speed 


Higher speeds are possible by having each SoPEC 
print one vertical strip of the page. 


Fixative channel data generation 


Extremely fast ink drying without wastage 


Built-in security 


Revenue nrKxiels are protected 


Undercolor removal on dot-by-dot basis 


Reduced ink usage 


Does not require fonts for high speed 
operation 


No font substitution or missing fonts 


Flexible printhead configuration 


Many configurations of printheads are supported by 
one chip type 


Drives Bi-lithic printheads directly 


No print driver chips required, results in lower cost 


Determines dot accurate inlc usage 


Removes need for physical ink monitoring system in 
Ink cartridges 



9.1 Printing rates 

The required printing rate for SoPEC is 30 sheets per minute with an inter-sheet spacing of 4 cm. To 
achieve a 30 sheets per minute print rate, this requires: 
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SOOmin X 63 (dot/mm) / 2 sec = 105.8 jiscconds per line, with no inter-sheet gap. 

340inm x 63 (dot/mm) / 2 sec 93.3 ^seconds per line, with a 4 cm inter-sheet gap. 

A Printline for an A4 page consists of 13824 nozzles across the page [2]. At a system clock rate of 160 
MHz 13824 dots of data can be generated in 86.4 ^seconds. Therefore data can be generated fast enough 
to meet the printing speed requirement It is necessary to deliver this print data to the print-heads. 

Printheads can be made up of 5:5, 6:4, 7:3 and 8:2 inch printhcad combinations [2]. Print data is trans- 
ferred to both print heads in a pair simultaneously. This means the longest time to print a line is determined 
by the time to transfer print data to the longest print segment. There are 9744 nozzles across a 7 inch print- 
head. The print data is transferred to the printhead at a rate of 106 MHz (2/3 of the system clock rate) per 
I color plane. This meaiis that it will take 91.9 \is to transfer a single line for a 7:3 printhead configuration. 

So we can meet the requirement of 30 sheets per minute printing with a 4 cm gap with a 7:3 printhead 
combination. There are 1 1 1 60 across an 8 inch printhead To transfer the data to the printhead at 1 06 MHz 
will take 1053 ^s. So an 8:2 printhead combination printing with an inter-sheet gap will print slower than 
30 sheets per minute. 

9.2 SoPEC BASIC ARCHITECTURE 

From the highest point of view ^e SoPEC device consists of 3 distinct subsystems 

• CPU Subsystem 

• DRAM Subsystem 

• Print Engine Pipeline (PEP) Subsystem 

See Figure 1 3 for a block level diagram of SoPEC. 

9^.1 CPU Subsystem 

I The CPU subsystem controls and configures all aspects of the other subsystems. It provides general sup- 

port for interfacing and synchronising the external printer with the internal print engine. It also controls the 
low speed communication to the QA chips. The CPU subsystem contains various peripherals to aid the 

I CPU, such as GPIO (incltides motor control), interrupt controller, LSS Master and general timers. The 

Serial Communications Block (SCB) on the CPU subsystem provides a full speed USBl .1 interface to the 
Host as well as an Inter SoPEC Interface (ISI) to other SoPEC devices. 

9.2.2 DRAM Subsystem 

The DRAM subsystem accepts requests from the CPU, Serial Communications Block (SCB) and blocks 
within the PEP subsystem. The DRAM subsystem (in particular the DIU) arbitrates the various requests 
and determines which request should win access to the DRAM. The DIU arbitrates based on configured 
parameters, to allow sufficient access to DRAM for all requestors. The DIU also hides the implementation 
specifics of the DRAM such as page size, number of banks, refresh rates etc. 

Print Engine Pipeline (PEP) subsystem 

The Print Engine Pipeline (PEP) subsystem accepts compressed pages from DRAM and renders them to 
bi-level dots for a given print line destined for a printhead interface that corrmiunicates directly with up to 
2 segments of a bi-lithic printhead. 

The first stage of the page expansion pipeline is the CDU, LBD and TE. The CDU expands the JPEG-com- 
pressed contone (typically CMYK) layer, the LBD expands the compressed bi-level layer (typically K), 
and the TE encodes Netpage tags for later rendering (typically in IR or K ink). The output from the first 
stage is a set of buffers: the CFU, SFU, and TFU. The CFU and SFU buffers are implemented in DRAM. 



9.2.3 

I 
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The second stage is the HCU, which dithers the contone layer, and composites position tags and the bi- 
level spotO layer over the resulting bi-level dithered layer. A number of options exist for the way in which 
compositing occurs. Up to 6 channels of bi-level data are produced from this stage. Note that not all 6 
channels may be present on the printhead. For example, the printhead may be CMY only, with K pushed 
into the CMY channels and IR ignored. Alternatively, the position tags may be printed in K. if IR ink is not 
available (or for testing piirposes). 

The third stage (DNC) compensates for dead nozzles in the printhead by color redundancy and error dif- 
fusing dead nozzle data into surrounding dots. 

The resultant bi-level 6 channel dot-data (typically CMYK-IRF) is buffered and written out to a set of line 
btiffers stored in DRAM via the DWU. 

Finally, the dot-data is loaded back from DRAM, and passed to the printhead interface via a dot FIFO. The 
dot Fn^O accispts data from the LLU at the system clock rate (pclkX while the PHI removes data from the 
FIFO and sends it to the printhead at a rate of 2/3 times the system clock rate (see Section 9.1). 
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CPU sub-system 



CPU 



RDU 



MMU 



Masta 



TIM 



' .Slave 
4 " /H 



Boot ROM 



Stave- 



ICU 



S^avB 

H 



PSS 



stave 
k N 



•USBHost 



SCB 



USB PHY 



USB 
Device 


4— ♦ 






DMA 










cm 


ISI 









^otor Controf. 
LSS.tSI. 
LED, etc. 



I 



CPR 



j^SbVB^ 



GPIO 



LSS 
Master 



Slave 



eORAM 
Z — 



DRAM sub-system 



DIU 



▲ A A 



Bus • 
CPU Sutxtfystem 



f 
f 

^ stave l i 
! 



PCU 



DRAM bus 



Print Engine Pipeline sub-system 



Master 



4 ^ 

• ■ • 


► 

k » 


^' 




4 ^ 






^ 

4^ > 




4— 


' — : — ♦ 
4 > 






^ 

4 » 




4 P 




• ♦ 

4—^ ► 






b 

— • — 




4 : 


4 P 






► 

4 > 


4 ¥ 



CDU 



CPU 



LBD 



SFU 



TE 



TFU 



HCU 



DNC 




r 


DWU 






LLU 




I 


J 2 

P 


HI 



* PEP Configuration Bus 

Figure 13. SoPEC System Top Level partition 



BHithic 
Printhead 



Doc: SoPEC_hardware_d0Sign S3 Proprietary Document 29 Nov 2002 

Version: 2.3 Page 39 



SoPEC : Hardware Design 



9.3 SoPEC Block Description 

Looking at Figure 13, the various units are described here in summary fomi: 



Table 9. Units within SoPEC 











DRAM 


DIU 


DRAM intertace unit 


Provides the Interface for DRAM read and write access 
for the various SoPEC units, CPU and the SCB block. 

The DIU provides arbitration between competing units 
controls DRAM access. 




ORAM 


Embedded ORAM 


20Mbits of embe<Med DRAM, 


CPU 


CPU 


Central Processing Unit 


CPU for system configuration and control 




MMU 


Memory Management Unit 


Limits access to certain memory address areas in CPU 
user mode 




RDU 


Real-time Debug Unit 


Facilitates the observation of the contents of most of the 
CPU addressable registers in SoPEC In addition to 
some pseudo-registers In realtime. 




TIM 


General Timer 


Contains watchdog and general system timers 




LBS 


Low Speed Serial Interfaces 


Low level controller for interfacing with the OA chips 




GPIO 


General Purpose 10$ 


Genera] lO controller, with built-in Motor control unit, 
LED pulse units and de-glitch circuitry 




ROM 


Boot ROM 


16 KBytes of System Boot ROM code 




ICU 


Interrupt Controller Unit 


General Purpose interrupt controller with configurable 
priority, and masking. 




CPR 


Ctock. Power and Reset 
block 


Central Unit for controlling and generating the system 
clocks and resets arui powerdown mechanisms 




PSS 


Power Save Storage 


Storage retained while system is powered down 




USB 


Universal Serial Bus Device 


USB device controller for interfacing wHh the Host USB. 




ISI 


Inter-SoPEC Interlace 


' ISI controller for data and control comnuinication with 
other SoPECs in a mutti-SoPEC system 




SOB 


Serial Communication Block 


Contains both the USB and ISI blocks. 
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Table 9. Units within SoPEC 















gAcronyr 






Print Engine 

Pipeline 

(PEP) 


PCU 


PEP controller 


Provides external CPU with the means to read and write 
PEP Unit registers, and read and write DRAM in single 
32-bit chunks. 


CDU 


Contone decoder unit 


Expands JPEG compressed contone layer and writes 
decompressed contone to DRAM 


CPU 


Contono FIFO Untt 


Provides line buffering between CDU and HCU 


LBD 


Lossless BHevel Decoder 


Expands compressed bi-level layer. 


SPU 


Spot RFO Unit 


Provides line kMffierIng between LBD and HCU 


TE 


Tag encoder 


Encodes tag data into line of tag dots. 


TPU 


Tag RFO Unit 


Provides tag data storage t)etween TE and HCU 


HCU 


Halfloner compositor unit 


Dithers contone layer and composites the bMevel spot 0 
and position tag dots. 


DNC 


Dead Nozzle Compensator 


Compensates for dead nozzles by color redundancy and 
error diffusing dead nozzle data into surrounding dots. 


DWU 


Dotline Writer Unit 


Writee out the 6 channels of dot data for a given printDne 
to the fine store DRAM 


LLU 


Line Loader Unit 


Reads the expanded page Image from line store, format- 
ting the data appropriately for the bi-lithic printhead. 


PHI 


PrintHead Intertace 


is responsible for sending dot data to the bi-lithic print* . 
heads and for providing line synchronization between 
multiple SoPECs. Also provides test interface fo print- 
head such as temperature monitoring and Dead Nozzle 
Identification. 



9.4 Addressing scheme in SoPEC 

SoPEC must address 

• 20 Mbit DRAM. 

• PCU addressed registers in PEP. 

• CPU-subsystem addressed registers. 

SoPEC has a unified address space with the CPU capable of addressing all CPU-subsystem and PCU-bus 
accessible registers (in PEP) and all locations in DRAM. The CPU generates byte-aligned addresses for 
the whole of SoPEC. 

22 bits are sufficient to byte address the whole SoPEC address space. 

9.4.1 DRAM addressing scheme 

The embedded DRAM is composed of 256-bit words. However the CPU-subsystem may need to write 
individual bytes of DRAM. Therefore it was decided to make the DIU byte addressable. 22 bits are 
required to byte address 20 Mbits of DRAM. 

Most blocks read or write 256-bit words of DRAM. Therefore only the top 17 bits i.e. bits 21 to 5 are 
required to address 256-bit word aligned locations. 

The exceptions are 

• CDU which can write 64-bits so only the top 1 9 address bits i.e. bits 21 -3 are required, 

• The CPU-subsystem always generates a 22-bit byte-aligned DIU address but it will send flags to the 
DIU indicating whether it is an 8, 16 or 32-bit write. 
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All DIU accesses must be within the same 256-bit aligned DRAM word 



9.4.2 



PEP Unit DRAM addressing 



PEP Unit configuration registers which specify DRAM locations should specify 256'bit aligned DRAM 
addresses i.e. using address bits 21:5. Legacy blocks from PECl e.g. the LED and TE may need to specify 
64-bit aligned DRAM addresses if these reused blocks DRAM addressing is difficult to modify. These 64- 
bit aligned addresses require address bits 21:3. However, these 64-bit aligned addresses should be pro- 
grammed to start at a 256-bit DRAM word boundary. 

Unlike PECl, there are no constraints in SoPEC on data organization in DRAM except that all data struc- 
tures must start on a 256-bit DRAM boundary. If data stored is not a multiple of 256-bits then the last word 
should be padded. 



The CPU-bus supports 32-bit word aligned read and write accesses with variable access timings. See sec- 
tion 1 1.4 for more details of the access protocol used on this bus. The CPU-bus does not currently support 
byte reads and writes but this can be added at a later date if required by imported IP. 



The PCU only supports 32-bit register reads and writes for the PEP blocks. As the PEP blocks only occupy 
a subsection of the overall address map and the PCU is explicitly selected by the MMU when a PEP block 
is being accessed the PCU does not need to perfonn a decode of the higher-order address bits. See 
Table 1 1 for the PEP subsystem address map. 



9.4.3 



CPU«bus addressed registers 



9.4.4 



PCU addressed registers in PEP 
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9.5 SoPEC Memory Map 



9.5.1 Main memory map 



The system wide memory map is shown in Figure 14 below. The memory map is discussed in detail in 
Section 1 1 Central Processing Unit (CPU). 



Accesses in this 
area are not 
alfowed and 
result in a bus 
error exoeption. 



Accesses in this 
area are via the 
CPU bus and are 
controlled by 
permissions set in ' 
each peripheral. 



Accesses in this 
area are via the 
DtU bus and are 
oontroiled by 
permissions set in) 
the MMU. 




OxFFFF FFFF 



PCU Mapped Registers 



Peripheral Registers 



ROM 



DRAM 



0x002A_C000 
0x002A^0000 
0x0029.0000 
0x0028^0000 




DRAM 
Regions 



0x0000 0000 



Figure 14. Proposed SoPEC CPU memory map (not to scale) 

9.5.2 CPU^bus peripherals address map 

The address mapping for the peripherals attached to the CPU-bus is shown in Table 10 below. The MMU 
performs the decode of cpu_adr[2 1: 12] to generate the relevant cpujblock^elect signal for each block. 
The addressed blocks decode however many of the lower order bits of cpu_adr[ll:2] are required to 
address all the registers within the block. 

Table 10. CPU-bus peripherals address nwp 



MMU.base 


0x0029_0000 


TlM_base 


0x0029.1000 


LSS_ba$e 


0x0029.2000 


GPl6_base 


0x0029_3000 


SCB.base 


0x0029.4000 
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Table 10. CPU-bus pertpherals address map 









ICU^base 


0x0029.5000 


CPR^base 


OxOO29_600O 


ROM_base 


0x0029^7000 


DlU_base 


0xOO29_8O00 


PSS.base 


0x0029.9000 


Reserved 


0x0029_A000 to 0x0029_FFFF 


PCU^basa 


Ox002AJ)000 to Ox002A_BFFF 



9.5.3 PCU Mapped Registers (PEP blocks) address map 

The PEP blocks are addressed via the PCU. From Figure 14, the PCU mapped registers are in the range 
Ox002A_0000 to 0xO02A_BFFF. From Table U it can be seen that there are 1 2 sub-blocks within the PCU 
address space. Therefore, only four bits are necessary to address each of the sub-blocks within the P£P 
part of SoPEC. A further 1 2 bits may be used to address any configurable register within a PEP block. This 
gives scope for 1024 configurable registers per sub-block (the PCU mapped registers are all 32-bit 
addressed registers so the upper 10 bits are required to individually address them). This address will come 
either from the CPU or from a command stored in DRAM. The bus is assembled as follows: 

• address[15:12] = sub-block address, 

• address[n:2] ^ register address within sub-block, only the nimiber of bits required to decode the regis- 
ters within each sub-block are used, 

• address[l :0] = byte address, unused as PCU mapped registers are all 32-bit addressed registers. 

So for the case of the HCU. its addresses range from 0x7000 to 0x7FFF within the PEP subsystem or from 
0x002A.7000 to 0x002A»7FFFF in the overall system. 



Table 11. PEP blocks address map 







PCU.base 


Ox002A_0000 


CDU_base 


Ox002A_1000 


CFU.base 


0x002A_20O0 


LBD.base 


0xOO2A_30OO 


SFU_base 


0x0O2A_4000 


TE.base 


Ox002A_5000 


TFU^base 


Ox002A^6000 


HCU.base 


Ox002A_7000 


DNC.base 


OX002A.8000 


DWU_base 


Ox002A_9000 


LLU.base 


Ox002A„AOOO 


PHI.base 


Ox0O2A_B0O0 to Ox002A_BFFF 
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9.6 



Buffer management in SoPEC 



As outlined in Section 9.1, SoPEC has a requirement to print 1 side every 2 seconds i.e. 30 sides 
minute. 



Approximately 2 Mbytes of DRAM are reserved for compressed page buffering in SoPEC. If a page is 
compressed to fit within 2 Mbyte then a complete page can be transferred to DRAM before printing. How- 
ever, the time to transfer 2 Mbyte using USB 1.1 is approximately 2 seconds. The worst case cycle time to 
print a page then approaches 4 seconds. This reduces the worst-case print speed to 15 pages per minute. 



The SoPEC page-expansion blocks support the notion of page banding. The page can be divided into 
bands and another band can be sent down to SoPEC while we are printing the current band. 

Therefore we can start printing once at least one band has been downloaded. 

The band size granularity should be carefully chosen to allow efficient use of the USB bandwidth and 
DRAM buffer space. It should be small enough to allow seamless 30 sides per minute printing but not so 
small as to introduce excessive CPU overhead in orchestrating the data transfer and parsing the band head- 
ers. Band-finish interrupts have been provided to notify the CPU of free buffer space. It is likely that the 
Host PC will supervise the band transfer and buffer management instead of the SoPEC CPU. 

If SoPEC starts printing before the complete page has been transferred to memory there is a risk of a buffer 
underrun occurring if subsequent bands are not transferred to SoPEC in time e.g. due to insufficient USB 
bandwidth caused by another USB peripheral consuming USB bandwidth. A buffer underrim occurs if a 
line synchronisation pulse is received before a line of data has been transferred to the printhead and causes 
the print job to fail at that line. If there is no risk of buffer underrun then printing can safely start once at 
least one band has been downloaded. 

If there is a risk of a buffer underrun occurring due to an intenruption of compressed page data transfer, 
then the safest ^proach is to only start printing once we have loaded up the data for a complete page. This 
means tfiat a worst case latency in the region of 2 seconds (with USBl.l) wiU be incurred before printing 
the first page. Subsequent pages will take 2 seconds to print giving us the required sustained printing rate 
of 30 sides per minute. 

A Storage SoPEC (Section 7.2.5) could be added to the system to provide guaranteed bandwidth data 
delivery. The print system could also be constructed using an ISI-Bridge chip (Section 7.2.6) to provide 
guaranteed data delivery. 

The most efficient page banding strategy is likely to be determined on a per page/ print job basis and so 
SoPEC will support the use of bands of any size. 



9.6.1 



Page buffering 



9.6.2 



Band buffering 
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10 SoPEC Use Cases 



10.1 



Introduction 



This ch^ter is intended to give an overview of a representative set of scenarios or use cases which SoPEC 
can perform. SoPEC is by no means restricted to the particular use cases described here. 

In this ch^ter we discuss SoPEC use cases under four headings: 

1) Norma! operation use cases. 

2) Security use cases. 

3) Miscellaneous use cases. 

4) Failure mode use cases. 

Use cases for both single and multi-SoPEC systems are outlined. 
Some tasks may be composed of a number of sub-tasks. 

The realtime requirements for SoPEC software tasks are discussed in ^'Central Processing Unit (CPU)" 
under Section 1 1 .3 Realtime requirements. 



10.2 Normal operation in a single SoPEC System with USB Host connection 



SoPEC operation is broken up into a number of sections which are outlined below. Buffer management in 
a SoPEC system is normally performed by the Host. 



10.2.1 Powerup 

Powerup describes SoPEC initialisation following an external reset or the watchdog timer system reset. 
A typical powerup sequence is: 



1 ) Execute reset sequence for complete SoPEC. 

2) CPU boot from ROM. 

3) Basic configuration of CPU peripherals^ SCB and DIU. DRAM initialisation. USB Wakeup. 

4) Download and authentication of program (see Section 10.5.2). 

5) Store reusable authentication results in Power-Safe Storage (PSS). 

6) Execution of program from DRAM. 

7) Retrieve operating parameters from PRINTER^QA and authenticate operating parameters. 

8) Download and authenticate any further ^faro^ets. 



The CPU can put different sections of SoPEC into sleep mode by writing to registers in the CPR block 
(chapter 16). Normally the CPU sub-system and the DRAM will be put in sleep mode but the SCB and 
power-safe storage (PSS) will still be enabled. 

Wakeup describes SoPEC recovery from sleep mode with the SCB and power-safe storage (PSS) still 
enabled. In a single SoPEC system, wakeup can be initiated following a USB reset from the SCB. 

A typical USB wakeup sequence is: 

1) Execute reset sequence for sections of SoPEC in sleep mode. 

2) CPU boot from ROM» if CPU-subsystem was in sleep mode. 

3) Basic configuration of CPU peripherals and DIU, and DRAM initialisation, if required. 



10.2.2 USB wakeup 
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4) Download and authentication of program using results in Power-Safe Storage (PSS) (see Section 
10.5.2). 

5) Execution of program from DRAM. 

6) Retrieve operating parameters from PRINTER^QA and authenticate operating parameter. 

7) Download and authenticate using results in PSS of any further datasets (programs). 



10.2.3 Print initialization 

This sequence is typically performed at the start of a print job following powerup or wakeup: 

1) Check amount of ink remaining via QA chips. 

2) Download static data e,g. dither matrices, dead nozzle tables from Host to DRAM. 

3) Check printhead temperature, if required, and configure printhead with firing pulse profile etc. 
accordingly. 

4) Initiate printhead pre-heat sequence, if required. 

10.2.4 First page download 

Buffer management in a SoPEC system is normally performed by the Host. 
First page, first band download and processing: 

1) The Host communicates to the SoPEC CPU over the USB to check that DRAM space remaining is 
sufficient to download the first band. 
I 2) The Host downloads the first band (with the page header) to DRAM. 

3) When the complete page header has been downloaded the SoPEC CPU processes the page header, 
I calculates PEP register commands and writes directly to PEP registers or to DRAM. 

4) If PEP register conmiands have been written to DRAM, execute PEP commands from DRAM via 
PCU. 

Remaining bands download and processing: 

1) Check DRAM space remaining is sufficient to download the next band. 

2) Download the next band with the band header to DRAM, 

3) When the complete band header has been downloaded, process the band header according to 
whichever band-related register updating mechanism is being used. 

10.2.5 Start printing 

1) Wait until at least one band of the first page has been downloaded. 

One approach is to only start printing once we have loaded up the data for a complete page. If we 
start printing before the complete page has be«i transferred to memory we run the risk of a buffer 
underrun occurring because compressed page data was not transfeired to SoPEC in time e.g. due to 
insufficient USB bandwidth caused by another USB peripheral consuming USB bandwidth. 

2) Start all the PEP Units by writing to their Go registers, via PCU commands executed from DRAM 
or direct CPU writes. A rapid startup order for the PEP units is outlined in Table 12. 



Table 12. lyplcal PEP Unit startup order for printing a page. 




1 


DNC 


2 


DWU 


3 


HCU 
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Table 12. IVploal PEP Unit startup order for printing a page. 





m 






4 


PHI 


5 


LLU 


6 


CFU, SFU. TFU 


7 


CDU 


8 


TE. LBD 



3) Print ready interrupt occurs (from PHI). 

4) Start motor control, if first page, otherwise feed the next page. This step could occur before the print 
ready interrxipt. 

5) Drive LEOs» monitor paper status. 

6) Wait for page alignment via page sensoT(s) GPIO interrupt. 

7} CPU instructs PHI to start producing line syncs and hence commence printing, or wait for an exter- 
nal device to produce line syncs. 
8) Continue to download bands and process page and band headers for next page. 

10.2.6 Next page(s) download 

As for first page download, performed during printing of current page. 

10*2.7 Between bands 

When the finished band flags are asserted band related registers in the CDU, LBD, TE need to be re-pro- 
grammed before the subsequent band can be printed This can be via PCU commands from DRAM. Typi- 
cally only 3-5 commands per decompression unit need to be executed. These registers can also be 
reprogrammed directly by the CPU or most likely by updating from shadow registers. The finished band 
flag internets the CPU to tell the CPU that the area of memory associated with the band is now free. 

10.2.8 During page print 

Typically during page printing ink usage is communicated to the QA chips. 

1) Calculate ink printed (from PHI). 

2) Decrement ink remaining (via QA chips). 

3) Check amount of ink remaining (via QA chips). This operation may be better perfonned while the 
page is being printed rather than at the end of the page. 

10.2.9 Page finish 

These operations are typically performed when the page is finished: 

1) Page finished interrupt occurs from PHL 

2) Shutdown the PEP blocks by de-asserting their Go registers. A typical shutdown order is defined in 
Table 13. This will set the PEP Unit state-machines to their idle states without resetting their config- 
uration registers. 

3) Communicate ink usage to QA chips, if required. 
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Table 13. End of page shutdown order for PEP Units (T80). 



1 


PHI (wtti shutdown by itsetf In the normal case at the end of a page) 


2 


OWU (shutting this down stalls the ONC and therefore the HCU and above) 


3 


LLU (should already be halted due to PHI at end of last tine of page) 


4 


TE (this Is the only dot supplier likely to be running, halted by the HCU) 


5 


CDU (this Is likely to already be halted due to end of contone band) 


6 


CFU. SFU. TFU. LBO (order unimportant, and should already be halted due to end of 
band) 


7 


HCU. DNC (order unimportant, should already have halted) 



1 0.2.1 0 Start of next page 

These operations are typically performed before printing the next page: 

1) Re-progiam the PEP Units via PCU command processing from DRAM based on page header. 

2) Go to Start printing. 

1 0.2.1 1 End of document 

1) Stop motor control. 

10.2.12 Powerdown 

In this mode SoPEC is no longer powered 

1) Instruct Host PC via USB that SoPEC is about to power down. 



10.2.13 Sleep 



The CPU can put different sections of SoPEC into sleep mode by writing to registers in the CPR block 
described in Section 16. 

1 ) Instruct Host PC via USB that SoPEC is about to sleep. 

2) Put SoPEC into defined sleep mode. 
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10.3 Normal operation in a Multi-SoPEC System - ISIMaster SoPEC 

In a multi-SoPEC system the Host generally manages program and compressed page download to all the 
SoPECs. Inter-SoPEC communication is over the ISI link which will add a latency. 

In the case of a multi-SoPEC system with a USB l.l connection^ the SoPEC with the USB connection is 
the ISIMaster. The ISI-bridge chip is the ISIMaster in the case of an ISI-Bridge SoPEC configuration. 

In a multi-SoPEC system one of Uie SoPECs will be the PnntMaster* This SoPEC must manage and con- 
trol sensors and actuators e.g. motor control. These sensors and actuators could be distributed over all ttie 
SoPECs in the system. An ISIMaster SoPEC may also be the PrintMaster SoPEC. 

In a multi-SoPEC system each printing SoPEC will generally have its own PRINTER^QA chip {or at least 
access to a PRINTER_QA chip that contains the SoPEC*s S0PEC_id_key) to validate operating parame- 
ters and ink usage. The results of these operations may be conununicated to the PrintMaster SoPEC. 

In general the ISIMaster may need to be able to: 

• Send messages to the ISISIaves which will cause the ISISlaves to send their status to the ISIMaster. 

• Instruct the ISISIaves to perform certain operations. 

As the ISI is an insecure interface commands issued over the ISI are regarded as user mode commands. 
Supervisor mode code running on the SoPEC CPUs will allow or disallow these commands. The software 
protocol needs to be constructed with this in mind. 

Existing requirements indicate that it is sufficient for the ISIMaster to initiate all communication with the 
ISISIaves. 

SoPEC operation is broken ^p into a number of sections which are outlined below. 

10.3.1 Powerup 

Powenip describes SoPEC initialisation following an external reset or the watchdog timer system reset 

1) Execute reset sequence for complete SoPEC. 

2) CPU boot from ROM. 

3) Basic configuration of CPU peripherals, SCB and DIU. DRAM initialisation USB Wakeup 

4) SoPEC identification by activity on USB end-points 2-4 indicates it is the ISIMaster. 

5) Download and authentication of program (see Section 10.5.3). 

6) Store reiisable cryptographic results in Power-Safe Storage (PSS). 

7) Execution of program from DRAM. 

8) Retrieve operating parameters from PRINTER_QA and authenticate operating parameters. 

9) Download and authenticate any fijrther datasets (programs). 

10) The initial dataset may be broadcast to all the ISISIaves. 

1 1) ISIMaster master SoPEC then waits for a short time to allow the authentication to take place on the 
ISISlave SoPECs. 

12) Each ISISlave SoPEC is polled for the result of its program code authentication process. 

13) If all ISISIaves report successfid authentication the OEM code module can be distributed and 
authenticated. OEM could will most likely reside on one SoPEC. 

10.3.2 USB wakeup 

The CPU can put different sections of SoPEC into sleep mode by writing to registers in the CPR block 
[1 6]. Normally the CPU sub-system and the DRAM will be put in sleep mode but the SCB and power-safe 
storage (PSS) will still be enabled. 
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Wakeup describes SoPEC recovery from sleep mode with the SCB and power-safe storage (PSS) still 
enabled. For an ISIMaster SoPEC, wakeup can be initiated following a USB reset from the SCB. 
A typical USB wakeup sequence is: 

1) Execute reset sequence for sections of SoPEC in sleep mode. 

2) CPU boot from ROM, if CPU-subsystem was in sleep mode. 

3) Basic configuration of CPU peripherals and DIU, and DRAM initialisation, if required. 

4) SoPEC identification by activity on USB end-points 2-4 indicates it is the ISIMaster. 

5) Download and authentication of program using results in Power-Safe Storage (PSS) (see Section 
10.5.3). 

6) Execution of program from DRAM. 

7) Retrieve operating parameters from PRINTER^QA and authenticate operating parameters, 

8) Download and authenticate any further datasets (programs) using results in Power-Safe Storage 
(PSS) (see Section 10.5.3). 

9) Following steps as per Powerup. 

10.3.3 Print Initialization 

This sequence is typically performed at the start of a print job following powerup or wakeup: 

1) Check amoimt of ink remaining via QA chips which may be present on a ISISlave SoPEC. 

2) Download static data e,g. dither matrices, dead nozzle tables from Host to DRAM. 

3) Check printhead temperature, if required, and configure printhead with firing pulse profile etc. 
accordingly. Instruct ISlSlaves to also perform this operation. 

4) Initiate printhead pre-heat sequence, if required. Instmct ISlSlaves to also perform this operation: 

10.3.4 First page download 

Buffer management in a SoPEC system is nomially performed by the Host. 

1) The Host communicates to the SoPEC CPU over the USB to check that DRAM space remaining is 
sufficient to download the first band. 

2) The Host downloads the first band (with the page header) to DRAM. 

3) When the complete page header has been downloaded the SoPEC CPU processes the page header, 
calculates PEP register commands and write directly to PEP registers or to DRAM. 

4) If PEP register commands have been written to DRAM, execute PEP commands from DRAM via 
PCU. 

Poll ISlSlaves for DRAM status and download compressed data to ISlSlaves. 
Remaining first page bands download and processing: 

1 ) Check DRAM space remaining is sufficient to download the next band. 

2) Download the next band with the band header to DRAM. 

3) When the complete band header has been downloaded, process the band header according to 
whichever band-related register updating mechanism is being used. 

Pol! ISlSlaves for DRAM status and download compressed data to ISlSlaves. 

10.3.5 Start printing 

1) Wait until at least one band of the first page has been downloaded 

2) Start all the PEP Units by writing to their Go registers, via PCU commands executed from DRAM 
or direct CPU writes, in the suggested order defined in Table i 2. 

3) Print ready interrupt occurs (firom PHI). Poll ISlSlaves until print ready interrupt 
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4) Start motor control (which may be on an ISlSlaves SoPEC), if first page, otherwise feed the next 
page. This step could occur before the print ready interrupt. 

5) Drive LEDS, monitor paper status (which may be on an ISlSlaves SoPEC), 

6) Wait for page alignment via page sensor(s) GPIO interrupt (which may be on an ISlSlaves SoPEC). 

7) CPU instructs PHI to start producing master line syncs, or wait for an external device to produce 
line syncs. 

8) Continue to download bands and process page and band headers for next page. 

10.3.6 Next page(s) download 

As for first page download, performed during printing of current page. 



10.3.7 Between bands 

When the finished band flags are asserted band related registers in the CDU, LBD and TE need to be re- 
progranmied This can be via PCU commands £h>m DRAM. Typically only 3-5 commands per decom- 
pression unit need to be executed. These registers can also be reprogrammed directly by the CPU or by 
updating irom shadow registers. The finished band flag interrupts to the CPU, tell Ihe CPU that the area of 
memory associated with the band is now free. 



10.3.8 During page print 

TVpically during page printing ink usage is communicated to the QA chips. 

1) Calculate ink printed (from PHO- 

2) Decrement ink remaining (via QA chips). 

3) Check amoimt of ink remaining (via QA chips). This operation may be better performed while the 
page is being printed rather than at the end of the page. 



10.3.9 Page finish 

These operations are typically performed when the page is finished: 

1) Page finished interrupt occurs from PHI. Poll ISlSlaves for page finished interrupts. 

2) Shutdown the PEP blocks by de-asserting their Go registers in the suggested order in Table 13. This 
will set the PEP Unit state-machines to their startup states. 

3) CoQunimicate ink usage to QA chips, if required. 



10.3.10 Start of next page 



These operations are typically performed before printing the next page: 

1) Re-program the PEP Units via PCU command processing from DRAM based on page header 

2) Go to Start printing. 



10.3.11 End of document 

I) Stop motor control. This may be on an ISISlave SoPEC. 



10.3.12 Powerdown 

In this mode SoPEC is no longer powered. 

1) Instruct Host PC via USB that SoPEC system is about to power down. 

2) Instruct ISISlave SoPECs to powerdown. 

3) Powerdown ISIMaster SoPEC. 
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10.3.13 Sleep 



The CPU can put different sections of SoPEC into sleep mode by writing to registers in the CPR block 
[16]. 

1) Instruct Host PC via USB which parts of SoPEC system are about to sleep. 

2) Put defined SoPECs into defined sleep modes. 
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10.4 Normal operation in a Multi-SoPEC System - ISISlave SoPEC 

This section the outline typical operation of an ISISlave SoPEC in a miilti-SoPEC system. The ISIMaster 
can be another SoPEC or an ISI-Bridge chip. The ISISlave communicates with the Host via the ISIMaster. 
Buffer management in a SoPEC system is normally performed by the Host. 

10.4.1 Powerup 

Powenip describes SoPEC initialisation following an external reset or the watchdog timer system reset. 

A typical powerup sequence is: 

1) Execute reset sequence for complete SoPEC. 

2) CPU boot from ROM. 

3) Basic configuration of CPU peripherals, SCB and DIU. DRAM initialisation, 

4) Download and authentication of program (see Section 10.5.3). 

5) Store reusable cryptographic results in Power-Safe Storage (PSS). 

6) Execution of program from DRAM. 

7) Retrieve operating parameters from PRINTER_QA and authenticate operating parameters. 

8) SoPEC identification by sampling GPIO pins to determine ISIId. Communicate ISIId to ISIMaster. 

9) Download and authenticate any further datasets, 

10.4.2 ISl wakeup 

The CPU can put different sections of SoPEC into sleep mode by writing to registers in the CPR block 
[16]. Normally the CPU sub-system and the DRAM will be put in sleep mode but the SCB and power-safe 
storage (PSS) will still be enabled. 

Wakeup describes SoPEC recovery from sleep mode with the SCB and power-safe storage (PSS) still 
enabled: In an ISISlave SoPEC» wakeup can be initiated following an ISI reset from the SCB. 

A typical ISI wakeup sequence is: 

1 ) Execute reset sequence for sections of SoPEC in sleep mode. 

2) CPU boot from ROM, if CPU-subsystem was in sleep mode. 

3) Basic configuration of CPU peripherals and DIU, and DRAM initialisation, if reqmred. 

4) Download and authentication of program using results in Power-Safe Storage (PSS) (see Section 
10.5.3). 

5) Execution of program from DRAM. 

6) Retrieve operating parameters from PRINTER_QA and authenticate operating parameters. 

7) SoPEC identification by sampling GPIO pins to determine ISIId. Commimicate ISIId to ISIMaster. 

8) Download and authenticate any further datasets, 

10.4.3 Print Initialization 

This sequence is typically performed at the start of a print job following powerup or wakeup: 

1) Check amount of ink remaining via Q A chips. 

2) Download static data e.g. dither matrices, dead nozzle tables from ISIMaster to DRAM. 

3) Check printhead temperature, if required, and configure printhead with firing pulse profile etc. 
accordingly. 

4) Initiate printhead pre-heat sequence, if required. 
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10.4.4 First page download 

Buffer management in a SoPEC system is nonnally performed by the Host via the ISIMaster. 

1) Check DRAM space remaining is sufficient to download the first band. 

2) The Host downloads the first band (with the page header) to DRAM via the ISIMaster. 

3) When the complete page header has been downloaded, process the page header, calculate PEP reg- 
ister commands and write directly to PEP registers or to DRAM. 

4) If PEP register commands have been written to DRAM, execute PEP commands from DRAM via 
PCU. 

Remaining first page bands download and processing: 

1) Check DRAM space remaining is sufficient to download the next band. 

2) The Host downloads the first band (with the page header) to DRAM via the ISIMaster, 

3) When the complete band header has been downloaded, process the band header according to 
whichever band-related register updating mechanism is being used. 

10.4.5 Start printing 

1) Wait until at least one band of the first page has been downloaded. 

2) Start all the PEP Units by writing to their Go registers, via PCU commands executed from DRAM 
or direct CPU writes, in the order defined in Table 1 2. 

3) Print ready interrupt occurs (from PHI). Communicate to ISIMaster via ISI link. 

4) Stort motor control, if attached to this ISISlave, when requested by ISIMaster, if first page, other- 
wise feed next page. This step could occur before the print ready interrupt 

5) Drive LEDS, monitor paper status, if on this ISISlave SoPEC, when requested by ISIMaster 

6) Wait for page alignment via page sensor(s) GPIO interrupt, if on this ISISlave SoPEC, and send to 
ISIMaster. 

7) Wait for line sync and commence printing. 

8) Continue to download bands and process page and band headers for next page. 

10.4.6 Next page(s) download 

As for first band download, performed during printing of current page. 

10.4.7 Between bands 

When the finished band flags are asserted band related registers in the CDU, LBD and TE need to be re- 
programmed. This can be via PCU commands from DRAM. Typically only 3-5 commands per decom- 
pression unit need to be executed. These registers can also be reprogrammed directly by the CPU or by 
updating from shadow registers. The finished band flag interrupts to the CPU tell the CPU that the area of 
memory associated with the band is now free. 

10.4.8 During page print 

Typically during page printing ink usage is communicated to the Q A chips. 

1) Calculate ink printed (from PHI). 

2) Decrement ink remaining (via QA chips). 

3) Check amount of ink remaining (via QA chips). This operation may be better perfonned while the 
page is being printed rather than at the end of the page. 
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10.4.9 Page finish 

These operations are typically performed when the page is finished: 

1) Page finished interrupt occurs from PHI. Communicate page finished interrupt to ISIMaster. 

2) Shutdown the PEP blocks by de-asserting their Go registers in the suggested order in Table 13, This 
will set the PEP Unit state-machines to their startup states. 

3) Communicate ink usage to QA chips, if required. 

1 0.4.1 0 Start of next page 

These operations are typically performed before printing the next page: 

1) Re-program the PEP Units via PCU conunand processing ^om DRAM based on page header. 

2) Go to Start printing. 

10.4.1 1 End of document 

Stop motor control, if attached to this ISISlave, when requested by ISIMaster. 

10.4.12 Powerdown 

In this mode SoPEC is no longer powered, 

1) Powerdown ISISlave SoPEC when instructed by ISIMaster. 

10.4.13 Sleep 

The CPU can put different sections of SoPEC into sleep mode by writing to roisters in the CPR block 
[16]. 

1) Put SoPEC into defined sleep modes. 
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10.5 Security Use Cases 

Please see the *SoPEC Security Overview* [9] document for a more complete description of SoPEC secu- 
rity issues. The SoPEC boot operation is described in the ROM chapter of the SoPEC hardware design 
specification. Section 1 7.2. 

10.5.1 Communication with the OA chips 

Communication between SoPEC and the QA chips (i.e. INK.QA and PRINTER.QA) wiirtake place on 
at least a per power cycle and per page basis. Communication with the QA chips has three principal pur- 
poses: validating the presence of genuine QA chips (i.e the printer is using approved consumables), valida- 
tion of the amount of ink remaining in the cartridge and authenticating the operating parameters for the 
printer. After each page has been printed, SoPEC is expected to communicate the number of dots fired per 
ink plane to the QA chipset. SoPEC may also initiate decoy communications with the QA chips from time 
to time. 

Process: 

• When validating ink consumption SoPEC is expected to principally act as a conduit between the 
PRINTER_QA and INK^QA chips and to take certain actions (basically enable or disable printing and 
report status to Host PC) based on the result The communication channels are insecure but all traffic is 
signed to guarantee authenticity. 

Known Weaknesses 

• All commimication to the QA chips is over the LSS interfaces using a serial communication protocol. 
This is open to observation and so the communication protocol could be reverse engineered. In this 
case both the PRINTER^QA and INK^QA chips could be replaced by impostor devices (e.g. a single 
FPGA) that successfully emulated the communication protocol. As this would require physical modifi* 
cation of each printer this is considered to be an acceptably low risk. Any messages that are not signed 
by one of the symmetric keys (such as the SoPEC Jd_key) could be reverse engineered. The imposter 
device must also have access to the appropriate keys to crack the system. 

• If the secret keys in the QA chips arc exposed or cracked then the system, or parts of it, is compro- 
mised. 

Assumptions: 

CH The QA chips are not involved in the authentication of downloaded SoPEC code 

[2 ] The QA chip in the ink cartridge (INK_QA) does not directly affect the operation of the cartridge in 

any way i.e. it does not inhibit the flow of ink etc. 
[3 J The INK«QA and PRINTER^QA chips are identical in their virgin state. They only become a 

INK^QA or PRINTER^QA after their FlashROM has been programmed. 

10.5.2 Authentication of downloaded code In a single SoPEC system 

Process: 

1) SoPEC identification by activity on USB end-points 2-4 indicates it is the ISIMasten 

2) The program is downloaded to the embedded DRAM. 

3) The CPU calculates a SHA-l hash digest of the downloaded program. 

4) The ResetSrc register in the CPR block is read to determine whether or not a power-on reset 
occurred. 

5) If a power-on reset occurred the signature of the downloaded code (which needs to be in a known 
location such as the first or last N bytes of the downloaded code) is decrypted using the Silverbrook 
public bootOkey stored in ROM. This decrypted signature is the expected SHA-l hash of the 
accompanying program. The encryption algorithm is likely to be a public key algorithm such as 
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RSA. If a power-on reset did not occur then the expected SHA-1 hash is retrieved from the PSS and 
the compute intensive decryption is not required. 

6) The calculated and expected hash values are compared and if they match then the programs authen- 
ticity has been verified. 

7) If the hash values do not match then the Host PC is notified of the failure and software may decide 
to put the SoPEC device into powerdown mode. 

S) If the hash values match then the CPU starts executing the downloaded program. 

9) If, as is very likely, the downloaded program wishes to download subsequent programs (such as 
OEM code) it is responsible for ensuring the authenticity of everything it downloads. The down- 
loaded program may contain public keys that are used to authenticate subsequent downloads, thus 
forming a hierarchy of authentication. The SoPEC ROM does not control these authentications - it 
is solely concerned with verifying that the first program downloaded' has come from a trusted 
source. 

10) At some subsequent point OEM code starts executing. The Silverbrook supervisor code acts as an 
O/S to the OEM user mode code. The OEM code must access most SoPEC functionality via system 
calls to the Silverbrook code. 

1 l)The OEM code is expected to perform some simple 'turn on the lights' tasks after which the Host 
PC is informed that the printer is ready to print and the Start Printing iise case comes into play. 

Known Weaknesses: 

• If the Silverbrook private bootOkey is exposed or cracked then the system is seriously compromised. A 
ROM mask change would be required to reprogram the bootOkey. 

10.5.3 Authentication of downloaded code in a multl-SoPEC system 

10.5.3. 1 ISIMaster SoPEC Process: 

1) SoPEC identification by activity on USB end-points 2-4 indicates it is the ISIMaster. 

2) The SCB is configured to broadcast the data received from the Host PC. 

3) The program is downloaded to the embedded DRAM and broadcasted to all ISISlave SoPECs over 
thelSI. 

4) The CPU calculates a SHA-1 hash digest of the dovmloaded program. 

5) The ResetSrc register in the CPR block is read to determine whether or not a power-on reset 
occurred. 

6) If a power-on reset occurred the signature of the downloaded code (which needs to be in a known 
location such as the first or last N bytes of the downloaded code) is decrypted using the Silverbrook 
public bootOkey stored in ROM. This decrypted signature is the expected SHA-1 hash of the 
accompanying program. The encryption algorithm is likely to be a public key algorithm such as 
RSA. If a power-on reset did not occur then the expected SHA-1 hash is retrieved from the PSS and 
the compute intensive decryption is not required. 

7) The calculated and expected hash values are compared and if they match then the programs authen- 
ticity has been verified. 

8) If the hash values do not match then the Host PC is notified of the failure and software may decide 
to put the SoPEC device into powerdo\sfn mode. 

9) If the hash values match then the CPU starts executing the downloaded program. 

10) It is likely that the downloaded program will poll each ISISlave SoPEC for the result of its authenti* 
cation process and to determine the number of slaves present. 

1 1 ) If any slave reports a failed authentication then the ISIMaster corrununicates this to the Host PC and 
puts itself into powerdown mode. 
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12) If all ISISlaves report successful authentication then the downloaded program is responsible for the 
downloading, authentication and distribution of subsequent programs within the multi-SoPEC sys- 
tem. 

13) At some subsequent point OEM code starts executing. The Silverbrook s^)ervisor code acts 85 an 
0/S to the OEM user mode code. The OEM code must access most SoPEC functionality via system 
calls to the Silverbrook code. 

14) The OEM code is expected to perform some simple 'turn on the lights' tasks after which the master 
SoPEC determines that all SoPECs are ready to print. The Host PC is informed that the printer is 
ready to print and the Start Printing use case comes into play. 



10.5.3.1 ISISIave SoPEC Process: 

1) When the CPU comes out of reset the SCB should still be in slave mode, and the SCB is already 
configured to receive data from the ISIMaster. 

2) The program is downloaded to embedded DRAM. 

3) The CPU calculates a SHA-1 hash digest of the downloaded program. 

4) The ResetSrc register in the CPR block is read to determine whether or not a power-on reset 
occurred 

5) If a power-on reset occurred the signature of the downloaded code (which needs to be in a known 
location such as the first or last N bytes of the downloaded code) is decrypted using the Silverbrook 
public bootOkey stored in ROM. This decrypted signature is the expected SHA-1 hash of the 
accompanying program. The encryption algorithm is likely to be a public key algorithm such as 
RSA. If a power-on reset did not occur then the expected SHA-1 hash is retrieved from the PSS and 
the compute intensive decryption is not required. 

6) The calculated and expected hash values are compared and if they match then the programs authen- 
ticity has been verified 

7) If the hash values do not match, then the ISISlave device will await a new program again, eventu- 
ally timing out and powering down. 

8) If the hash values match then the CPU starts executing the downloaded program. 

9) It is likely that the downloaded program will communicate the result of its authentication process to 
the ISIMaster. The downloaded program is responsible for determining the SoPECs ISIId, receiving 
and authenticating any subsequent programs. 

10) At some subsequent point OEM code starts executing. The Silverbrook supervisor code acts as an 
O/S to the OEM user mode code. The OEM code must access most SoPEC fimctionality via system 
calls to the Silverbrook code. 

1 1) The OEM code is expected to perform some simple *tum on the lights' tasks after which the inaster 
SoPEC is informed that this slave is ready to print. The Start Printing use case then comes into play. 

Known Weaknesses 

• If the Silverbrook private bootOkey is exposed or cracked then the system is seriously compromised 

• ISI is an open interface i.e. messages sent over the ISI are in the clear. The communication channels 
are insecure but all traffic is signed to guarantee authenticity. As all communication over the ISI is con- 
trolled by Supervisor code on both the ISIMaster and ISISlave then this also provides some protection 
against software attacks. 
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10.5.4 Authentication and upgrade of operating parameters for a printer 

The SoPEC IC will be used in a range of printers with different capabilities (e.g. A3/A4 printing, printing 
speed, resolution etc.). It is expected that some printers will also have a software upgrade capability which 
would allow a user to purchase a license that enables an upgrade in their printer*s capabilities (such as 
print speed). To facilitate this it must be possible to securely store the operating parameters in the 
PRINTER.QA chip, to securely communicate these parameters to the SoPEC and to securely reprogram 
the parameters in the event of an upgrade. Note that each printing SoPEC (as opposed to a SoPEC that is 
only used for the storage of data) will have its own PRJNTER_QA chip (or at least access to a 
PRINTER_QA that contains the SoPEC's SoPEC_id_kcy). Therefore both ISIMaster and ISISlave 
SoPECs will need to authenticate operating parameters. 

Process: 

1) Program code is downloaded and authenticated as described in sections 10.5.2 and 10.5.3 above. 

2) The program code has a function to create the SoPEC_id_kcy from the imique SoPECJd that was 
programmed when the SoPEC was manufactured. 

3) The SoPEC retrieves the signed operating parameters from its PRINTER_QA chip. The 
PRINTER^QA chip uses the SoPEC_id_key (which is stored as part of the pairing process exe- 
cuted diuing printhead assembly manufacture & test) to sign the operating parameters which are 
appended with a random number to thwart replay attacks. 

4) The SoPEC checks the signature of the operating parameters using its SoPEC_id_kcy. If this signa- 
ture authentication process is successful then the operating parameters are considered valid and the 
overall boot process continues. If not the error is reported to the Host PC. 

5) Operating parameters may also be set or upgraded using a second key, the PrintEngineLicenseJcey^ 
which is stored on the PRINTER_QA and used to authenticate the change in operating parameters. 

Known Weaknesses: 

• It may be possible to retrieve the unique SoPEC_id by placing the SoPEC in test mode and scanning it 
out. It is certainly possible to obtain it by reverse engineering the device. Either way the SoPEC_id 
(and by extension the SoPEC_id_kcy) so obtained is valid only for that specific SoPEC and so printers 
may only be compromised one at a time by parties with the appropriate specialised equipment. Fur- 
thermore even if the SoPEC.id is compromised, the other keys in the system, which protect the 
authentication of consumables and of program code, are unaffected. 
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10.6 



MiSCELUVNEOUS USE CASES 



There are many miscellaneous use cases such as the following examples. Software running on the SoPEC 
CPU or Host will decide on what actions to take in these scenarios. 



10«6.1 Disconnect / Reconnect of QA chips. 

1 ) Disconnect of a Q A chip between documents or if ink runs out mid-document. 

2) Re-connect of a QA chip once authenticated e.g. ink cartridge replacement should allow the system 
to resume and print the next docimient 

10.6.2 Page arrives before print ready interrupt. 

1) Engage clutch to stop paper until print ready interrupt occurs. 

10.6.3 Oead-nozzle table upgrade 



This sequence is typically performed when dead nozzle information needs to be i^dated by performing a 
printhead dead nozzle test. 

1) Run printhead nozzle test sequence 

2) Either Host or SoPEC CPU converts dead nozzle information into dead nozzle table. 

3) Store dead nozzle table on Host. 

4) Write dead nozzle table to SoPEC DRAM. 
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10.7 Failure Mode Use Cases 

10.7.1 System errors and security violations 

System errors and security violations are reported to the SoPEC CPU and Host. Software ninning on the 
SoPEC CPU or Host will then decide what actions to take. 



Silverbrook code authentication failure. 

1) Notify Host PC of authentication failure. 

2) Abort print run. 

OEM code authentication ^iure. 

1) Notify Host PC of authentication &ilure. 

2) Abort print run. 

Invalid QA chip(s). 

1) Report to Host PC. 

2) Abort print run. 

MMU security violation interrupt. 

1) This is handled by exception handler. 

2) Report to Host PC 

3) Abort print run. 

Invalid address interrupt from PCU. 

1) This is handled by exception handler. 

2) ReporttoHostPC. 

3) Abort print run. 

Watchdog timer interrupt. 

1) This is handled by exception handler. 

2) ReporttoHostPC. 

3) Abort print run. 

Host PC does not acknowledge message that SoPEC is about to power down. 
1) Power down anyway. 



10.7.2 Printing errors 

Printing errors are reported to the SoPEC CPU and Host. Software running on the Host or SoPEC CPU 
will then decide what actions to take. 



Insufficient space available in SoPEC compressed band-store to download a band. 
1) Report to the Host PC. 

Insufficient ink to print. 
1) ReporttoHostPC. 

Page not downloaded in time while printing. 

1) Buffer undemm interrupt will occur. 

2) Report to Host PC and abort print run. 

JPEG decoder error interrupt. 
1) ReporttoHostPC. 
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S5 



CPU Subsystem 
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1 1 Central Processing Unit (CPU) 

11,1 Overview 

The CPU block consists of the CPU core» MMU, cache and associated logic. The principal tasks for the 
program running on the CPU to fulfill in the system are: 

Communications: 

• Control the flow of data from the USB interface to the DRAM and ISI 

• Communication with the host via USB or ISI 

• Running the USB device driver 

PEP Subsystem Control: 

• Page and band header processing (may possibly be performed on host PC) 

• Configure printing options on a per band, per page, per job or per power cycle basis 

• Initiate page printing operation in the PEP subsystem 

• Retrieve dead nozzle information from the printhead interface (PHI) and forward to the host PC 

• Select the appropriate firing pulse profile from a set of predefined profiles based on the printhead 
characteristics 

• Retrieve printhead temperature via the PHI 

Security: 

• Authenticate downloaded program code and printer operating parameters 

• Authenticate consiunables via the PRINTER^QA and INKLQ A chips 

• Monitor ink usage 

• Isolation of OEM code from direct access to the system resources 
Other: 

• Drive the printer motors using the GPIO pins 

• Monitoring the status of the printer (paper jam, tray empty etc.) 

• Driving front panel LEDs 

• Perform post-boot initialisation of the SoPEC device 

• Memory management Oikely to be in conjunction with the host PC) 

• Miscellaneous housekeeping tasks 

To control the Print Engine Pipeline the CPU is required to provide a level of performance at least equiva- 
lent to a 16'bit Hitachi H8-3664 microcontroller running at 16 MHz. An as yet imdetermined amount of 
additional CPU performance is needed to perform the other tasks. The extra performance required is dom- 
inated by the signature verification task and the SCB (including the USB) management task. An operating 
system is not required at present. A number of CPU cores have been evaluated and the LEON PI 754 is 
considered to be the most appropriate solution. A diagram of the CPU block is shown in Figure 15 below. 
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AHB Controller 



AHB Interface 



LEON Core 



CACHE 
&MMU 



Address 
Decoder 



Realtime 

Debug 

Unit 



133 
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cpu.c(ataoLit{31 :0] 

dram^cpu^datarasS.-Ol 

cpu_aiu_rreq 

diu_cpu_rac*c 

diu_cpu_rvalid 

a)u_diu_wreq 

diu_cpu_wad( 

cpu_diu_wvalld 
cpu_diu_wmask(1 :0) 

cpu_acode[1:0] 

cpu_rwn 

cpu^cpf.sei 

cpr^cpu.rdy 

cpr_cpu__data(31 :0] 

cpu_QpIo_sel 

gpio_cpu_fdy 

gpio.cpu.data(31 :0] 

cpujcu_sel 

icu_cpu_rdy 

icu cpu.datar31:0] 



pcu_cpu_datat31 :0] 

cpu scb_sel 

sco_cpu_rdy 

scb_cpu_data[31 :0] 

cpu_tim_sd 

tim_cpu_rdy 

tint_cpu_data[3 1 :0] 

cpu_rom_sel 

r6m_cpu_rdy 

rom_:cpu_data[31 ,*0] 

cpu oss.sal 

pssjcpu-.roy 

pss_cpu_<!ata[31 :0] 

cpu_dlu_sel 

diu_cpu_rdy 

dlu.cpu_data(31 X)] 

diu.cpu_t)err 

pss.cpu.berr 

rQfn_cpu_beiT 

tini_cpu_berr 

scb_cpu_boiT 

pcu_cpu_beff 

lss_cpu_berT 

lcu_cpu_beiT 

flpto_cpu_berr 

cpr_cpu_befr 

dlu_cpu_debug_vaJid 

tim.cpu.debug.valid 

scb.cpu.debug_vaUd 

pcu.cpu.debug^valld 

lss.cpu_debug_valtd • 

icu_cpij_debug_vaHd 

gplo_cpu_debug_va1ld 

cpr_cpu_debua_valid 



debug_data_oul{18:0J 
debug_data_vaLlid 
-> d6bu9_cntr1[19:0] 



prst_n 
pclk 

icu.cpu_«eve!I3:0] 
cpujack 

cpujcujl6ver(3.*0] 



Figure 15. CPU block diagram 
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11.2 Definitions of I/Os 



Table 14. CPU Subsystem l/Os 







m§ 




Clocks and Resets 


prst.r> 


1 


In 


Global reset. Synchronous to pdk, active low. 


perk 


1 


In 


Global clock 


CPU to DIU DRAM Interface 


cpu_adr[21:0l 


22 


Out 


Address bus for both ORAM and peripheral access 


cpu.dataout[31:0] 


32 


Out 


Data out to both DRAM and peripheral devices. This should be 
driven at the same time as the cpu_adrar\f} request signals. 


d ram.cpu.data[2S5:0] 


2S6 


In 


Read data from the DRAM 


cpu_dlu_rreq 




Out 


Read request to the DIU DRAM 


diu_cpu_fack 




In 


Acknowledge from DIU that read request has been accepted. 


dju_cpu_rvand 




in 


Signal from DIU teUIng SoPEC Unit that valkl read data is on the 

dram^cpu^data bus 


cpu_diu_wreq 




Out 


Write request to the OIU 


diu.cpu_wack 




In 


Acknowledge from the OIU that the write request has been 
accepted 


cpu_diu_wvalid 


1 


Out 


Signal from the CPU to the DIU Indk^ting that the data currently on 
the q>u_dataout bus is valid 


cpu_diu_wmask(1 :0] 


2 


Out 


Rag indicating format of CPU write to ORAM 
cpu^diu^wmask = 00: 8-bit write 
' cpu^diu^wmask = 01 : 16-bit write 
cpULd/uLwmasfr = 10: 32-btt write 
cpu^diu_wma$k- 1 1 : reserved 

cpu_adr(2:0) are driven In accordance with the wWth of the data 
access indk;ated t>y cpu_diu_wmask. Addresses cannot cross a 
256-bit word DRAM boundary. 


CPU to perlplieral blocks 


cpu^rwn 


1 


Out 


Common read/not-write signal from the CPU 


cpu_acode(1K)] 


2 


Out 


CPU access code signals. 

cpu_acode[0] - Program <0) / Data (1) access 

cpu_acode[1] • User (0) / Supervisor <1) access 


cpu_cpr_sel 


1 


Out 


CPR block select. 


cpr_cpu_rdy 


1 


In 


Ready signal to the CPU. When cpr^cpu^rdy Xs high It indk^tes the 
last cyde of the access. For a write cyde this means cpujdataotn 
has been registered by the CPR bk>ck and for a read (^e this 
means the data on cprjcpujdata Is valid. 


cpL.cpu_berr 


1 


In 


CPR bus error signal to the CPU. 


cpr_cpu_data[31 :0] 


32 


In 


Read data bus from the CPR t>tock 


cpu_gpio_sel 


1 


Out 


OPIO bk>ck select 


gpio_cpu_rdy 


1 


In 


QPIO ready signal to the CPU. 


gpk>_cpu_benr 


1 


In 


GPIO bus error signal to the CPU. 


gpio_cpu_data[31 :0] 


32 


In 


Read data bus from the GPlO block 


cpujcu_sel 


1 


Out 


ICU block select. 


lcu_cpu„fdy 


1 


In 


ICU ready signal to the CPU. 


icu«.cpu^ben" 


1 


m 


ICU bus error signal to the CPU. 


icu jcpu_data[31 :0] 


32 


In 


Read data bus from the ICU block 
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Table 14. CPU Subsystem t/Os 









cpujss.sel 




Out 


LSS block select 


lss_cpu_rdy 




In 


LSS ready signal to the CPU. 


Iss_cpu_berr 




In 


LSS bus error signal to the CPU. 


l8S.cpu_data(3l :0] 


32 


In 


Read data bus from the LSS block 


cpu_pcu_sel 




Out 


PCU block select 


pcu_cpu_rdy 




tn 


PCU readly signal to the CPU. 


pcu_cpu_b©iT 




In 


PCU bus firror sionAl tA thn nPl J 


pcu_cpu_data(31 :0] 


32 


In 


Raad data ttu^ fmm thA Pr^i 1 Kin<^ 


q)u_scb_sel 


1 


Out 


S<^B Nnnk aaIoM 


scb.cpu.rcty 


1 


In 


oud reaoy signal to tne urU. 


scb_cpu_t>err 


1 


In 


oUD Dus error signal to the CPU. 


scb^cpu_data(31 :0] 


32 


In 


rieaa oata dus from the SCo block 


cpu_tim_sel 


1 


Out 


Timers t)lock select. 


tim.qpu_fdy 


1 


In 


Timers block ready signal to the CPU. 


ttm^cpu.berr 


1 


In 


Timers l3us error signal to the CPU. 


fim_cpu_data{31 :03 


32 


In 


Read data bus from the Timers block 


cpu_rom_sel 




Out 


ROM block select 


rom_cpu_rdy 




In 


ROM bkx:k ready signal to the CPU. 


rom_cpti_befT 




In 


ROM bus error signal to the CPU. 


/oni_cpu.data{31 :0] 


32 


In 


Read data bus from the ROM block 


cpu_pss_Gel 




Out 


PSS bk>Gk select 


pss_cpu_rdy 




In 


PSS block ready signal to the CPU. 


ps3_cpu_be rr 




In 


PSS bus error signal to the CPU. 


pss_cpu.data[31 K>] 


32 


In 


Read data bus from the PSS block 


cpu_diu_8el 




Out 


OIU register block select. 


diu_cpu_fdy 




In 


OIU register block ready signal to the CPU. 


dlu_cpu_benr 




In 


OIU bus error signal to the CPU. 


diu_cpu.data[31X)] 


32 


In 


Read data t>u3 from the DIU block 


Interrupt signals 


icu_cpujlevel[3:0) 


3 


In 


An Interrupt is asserted by driving the appropriate priority level on 
tcu^cpu_ileveL These signals must remain asserted unta the CPU 
executes an Interrupt acknowledge cyde. 


cpu.icuJleveJ[3:0] 


3 


Out 


Indicates the level of the interrupt the CPU Is acknowledging when 
cpu.j^dc is high 


cpujack 


1 


Out 


Inten^upt acknowledge signal. The exact timing depends on the 
CPU core Implementation 


Debug signals 


diu.Gpu_debug_valid 


1 


In 


Signal indicating the data on the <:iiu_cpu_data bus is valid debug 
data. 


Um_cpu_debtjg_valid 


1 


In 


Signal lndk:ating the data on the ttm_cpu^data bus is valid debug 
data. 


scb_cpu_d6bug.valld 


1 


In 


Signal indk:atmg the data on the $cb_epu^data bus Is vaBd debug 

data. 
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Table 14. CPU Subsystem VOs 











pcu_cpu_debugLvalId 




In 


Signal indicating the data on the pcu_cpu_^data bus is valid debug 
data. 


ls8_cpu_debugLvalid 




In 


Signal indicating the data on the iss_cpu_<iata bus is valid debug 
data. 


icu_cpu_debug_valid 




In 


Signal indicating the data on the icu_cpu_jiata bus Is valid debug 
data. 


gp{o_cpu_debuo.valid 




In 


Signal irKficattng the data on the QfHOjcpujdata bus is valid debug 
data. 


cpr_cpu_debug.valid 




(n 


Sjgnal Indicating the data on the cpr^cpujdata bus Is valid debug 
data. 


clebuo_data_out 


18 


Out 


Output debug data to be muxed on to the PHI pins 


debug_data_vaJtd 




Out 


Debug valid signal indicating the validity of the data on 
debug_data_out. This signal is used in all debug configurations 


detMjg^cntri 


20 


Out 


Control signal for each PHI bound debug data line Indicating 
whether or not the debug data should be selected by the pin niux 



11.3 Realtime requirements 

The SoPEC realtime requirements have yet to be fully determined but they may be split into three catego* 
ries: hard, firm and soft 

1 1 .3.1 Hard realtime requirements 

Hard requirements arc tasks that must be completed before a certain deadline or failure to do so will result 
in an error perceptible to the user (printing stops or functions incorrectly). There are three hard realtime 
tasks: 

• Motor control: The motors which feed the paper through the printer at a constant speed during 
printing are driven directly by the SoPEC dewce. Four periodic signals with different phase rela- 
tionships need to be generated to ensure the paper travels smoothly through the printer. The genera- 
tion of these signals is handled by the GPIO hardware (see section .13.2 for more details) but the 
CPU is responsible for enabling these signals (i.e. to start or stop the motors) and coordinating the 
movement of the ps^er with the printing operation of the printhead. 

• Buifer management: Data enters the SoPEC via the SCB at an uneven rate and is consumed by the 
PEP subsystem at a different rate. The CPU is responsible for managing the DRAM buffers to 
ensure that neither overrun nor underrun occur. This buffer management is likely to be performed 
under the direction of the host 

• Band processing: In certain cases PEP registers may need to be updated between bands. As the tim* 
ing requirements are most likely too stringent to be met by direct CPU writes to the PCU a more 
likely scenario is that a set of shadow registers will programmed in the compressed page units 
before the current band is finished, copied to band related registers by the finished band signals and 
the processing of the next band will continue inrmiediately. An alternative solution is that the CPU 
will construct a DRAM based set of commands (see section 21.8.5 for more details) that can be exe- 
cuted by the PCU. The task for the CPU here is to parse the band headers stored in DRAM and gen- 
erate a DRAM based set of commands for the next number of bands. The location of the DRAM 
based set of commands must then be written to the PCU before the current band has been processed 
by the PEP subsystem. It is also conceivable (but currently considered unlikely) that the host PC 
could create the DRAM based commands. In this case the CPU will only be required to point the 
PCU to the correct location in DRAM to execute conunands from. 
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11.3.2 Firm requirements 



Firm requirements are tasks that should be completed by a certain time or failure to do so will result in a 
degradation of performance but not an error The majority of the CPU tasks for SoPEC fall into this cate- 
gory including all interactions with the QA chips, program authentication, page feeding, configuring PEP 
registers for a page or job, determining the firing pulse profile, communication of printer status to the host 
oyer the USB and the monitoring of ink usage. The authentication of downloaded programs and messages 
will be the most compute intensive operation the CPU will be required to perform. Initial investigations 
indicate that the LEON processor, nmning at 160 MHz, will easily perform three authentications in under 
a second. 

Table 15. Expected firm requirements 







Power-on to start of printing first page (USB and slave SoPEC enumeration, 3 or more 
RSA signature verifications* code and compressed page data download and chip inittali* 
sation] 


- 8 sees ?? 


Wake-up from sleep mode to start printing [3 or more SHA-1 operations, code and com- 
pressed page data download and chip re-lnltialisation 


- 2 sees 


Authenticate ink usage in the printer 


- 0.5 sees 


Determining firing pulse profile 


- 0.1 sees 


Page feeding, gap between pages 


OEM dependent 


Conununlcatlon of printer status to host PC 


- 10 ms 


Configuring PEP registers 


7? 



1 1.3.3 Soft requirements 



Soft requirements are tasks that need to be done but there are only light time constraints on when they need 
to be done. These tasks are performed by the CPU when there are no pending higher priority tasks. As the 
SoPEC CPU is expected to be lightly loaded these tasks will mostly be executed soon after they aie sched- 
uled. 



1 1 .4 Bus Protocols 



As can be seen from Figure 15 above there are different buses in the CPU block and different protocols arc 
used for each bus. There are three buses in operation: 



1 1.4.1 CPU core to cache/MMU bus 



This is the native bus of the CPU core. See section 1 1 .6.6, 1 for more details. Timing and full signal details 
should be provided in the documentation accompanying this core. 



11.4.2 Cache/MMU to DIU bus 



This bus conforms to the DIU bus protocol described in Section 20.13.2. Note that the address and data 
buses are shared with the peripheral bus. The effective bus width differs between a read (256 bits) and a 
write (32/16/8 bits) and only the bottom 32 bits of the bus are shared with the peripheral bus. As certain 
CPU instructions may require byte write access this will need to be supported in the DIU. See section 
11.6.6.2 for more details. . 
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11.4.3 CPU Subsystem Bus 

For access to the on-chip peripherals a simple bus protocol is used. The MMU must first determine which 
particular block is being addressed (and that the access is a valid one) so that the appropriate block select 
signal can be generated. During a write access CPU write data is driven out with the address and block 
select signals in the first cycle of an access. The addressed slave peripheral responds by asserting its ready 
signal indicating that it has registered the write data and the access can complete. The write data bus is 
common to all peripherals and is also used for CPU writes to the embedded DRAM, A read access is initi- 
ated by driving the address and select signals during the first cycle of an access. The addressed slave 
responds by placing the read data on its bus and asserting its ready signal to indicate to the CPU that the 
read data is valid Each block has a separate point-to-point data bus for read accesses to avoid the need for 
a tri-stateable bus. 

All peripheral accesses are 32-bit. Support for byte or 16-bit accesses may be added if required by an 
imported IP block such as the USB controller. The use of the ready signal allows the accesses to be of vari- 
able length. In most cases accesses will complete in two cycles but three or four (or more) cycles accesses 
are likely for PEP blocks or IP blocks with a different native bus interface. All PEP blocks are accessed via 
the PCU which acts as a bridge. The PCU bus uses a similar protocol to the CPU subsystem bus but with 
the PCU as the bus master. 

The duration of accesses to the PEP blocks is influenced by whether or not the PCU is executing com- 
mands from DRAM. As these commands are essentially register writes the CPU access will need to wait 
until the PCU bus becomes available when a register access has been completed. This could lead to the 
CPU being stalled for up to 4 cycles if it attempts to access PEP blocks while the PCU is executing a com- 
mand. The size and probability of this penalty is sufficiently small to have any significant impact on per- 
formance. 

In order to support user mode (i.e. OEM code) access to certain peripherals the CPU subsystem bus prop- 
agates the CPU function code signals {cpu_acode[l :0]y These signals indicate the type of address space 
(i.e. User/Supeivisor and Program/Data) being accessed by the CPU for each access. Each peripheral must - 
determine whether or not the CPU is in the correct mode to be granted access to its registers and in some 
cases (e.g. Timers and GPIO blocks) different access permissions can apply to different registers within 
the block. If the CPU is not in the correct mode then tiie violation is flagged by asserting the block's bus 
error signal (Jblock^cpujberr) with the same timing as its ready signal (block_cpu^rdy) which remains 
deasserted. When this occurs invalid read accesses should return 0 and write accesses should have no 
effect. 

Figure 16 shows two examples of the peripheral bus protocol in action. A write to the LSS block from 
code running in supervisor mode is successfully completed. This is immediately followed by a read from a 
PEP block via the PCU from code running in user mode. As this type of access is not permitted the access 
is terminated with a bus error. The bus error exception processing then starts directly aifter this - no further 
accesses to the peripheral should be required as the exception handler should be located in the DRAM. 

Each peripheral acts as a slave on the CPU subsystem bus and its behavior is described by the state 
machine in section 11.4.3.1 
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pciK n_mn_rmjnjT_rT_n 



cpu.adr(21:0] b^^^ LSS address 
cpu^rwn 



PEP address l^^^s^ Supervisor staci] 



1 



cpu_acodell:OJ ^^s^^ "supvr Data | User Data |^^^^ Supvr Data 

cpujss.sel 

l8s_cpu_rdy 



1 



lss_cpu_berr 
q>u.dataout[3l:0] 
cpu_pcu_sel 

pcu.cpu.berr 
pcu«q)u_rdy 



J 



J — L 



pcu_cpu_data[3i:o] ls^^s;s^^s^^^^N^^^ 0x0000,0000 1 

Figure 16. CPU bus transactions 

f f .4.3. 1 CPU subsystem bus siave state machine 

CPU subsystem bus slave operation is described by the state machine in Figure 1 7. This state machine 
will be implemented in each CPU subsystem bus slave. The only new signals mentioned here arc the 
vaiid^access and reg^available signals. The valid_access is deteimined by comparing the cpujacode 
value with the block or register (in the case of a block that allow user access on a per register basis such as 
the GPIO block) access permissions and asserting vaiid_access if the pennissions agree with the CPU 
mode. The reg_available signal is only required in the PCU or in blocks that are not capable of two-cycle 
access (e.g. blocks containing imported IP with different bus protocols). In these blocks the reg_available 
signal is an internal signal used to insert wait states (by delaying the assertion of block^cpu^rdy) until the 
CPU bus slave interface can gain access to the register. 

When reading from a register that is less than 32 bits wide the CPU susysiems bus slave should return 
zeroes on the unused upper bits of the block_cpu_data bus. 

To support debug mode the contents of the register selected for debug observation, debugjreg^ are always 
output on the block_cpu_data bus whenever a read access is not taking place. See section 1 1.8 for more 
details of debug operation. 
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block, cpu. 
bIocK_qxi_da a 




block_cpu_beff s 0 



» 1 

reg.data - cpujdataout 



re9 available xrx, Q 



Figure 17. State machine for a CPU subsystem slave 



11.5 LEON CPU 

The LEON processor is an open-soiirce implementation of the IEEE- 1754 standard (SPARC V8) instruc- 
tion set. LEON is available from and actively siq)ported by Gaisler Research (www.gaisler.com). 

The following features of the LEON-2 processor will be utilised on SoPEC: 

• IEEE- 1754 (SPARC V8) compatible integer imit with 5-stage pipeline 

• Separate instruction and data cache (Harvard architecurc) 

• Set-associative caches: 1-4 sets, 1-64 kbyte/set. Random, LRR or LRU replacement. Direct 
mapped cacches are also available and are the more likely option for SoPEC. 

« Full implementation of AMBA-2.0 AHB on-chip bus 

• Power-down mode 

The standard release of LEON incoiporates a number of peripherals and support blocks which will not be 
included on SoPEC, The LEON core as used on SoPEC will consist of: 1) the LEON integer unit, 2) pos- 
sibly the instruction and data caches (currently under review), 3) the cache control logic (to be signifi- 
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cantly reduced by optimisation if the caches are not used), 4) the AHB interface and 5) possibly the AHB 
controller (although this functionality may be implemented in the LEON Bridge). 

The version of the LEON database that the SoPEC LEON components will be sourced ftom is LEON2- 
1.0.8 although later versions may be used if they offer worthwhile functionality or bug fixes that affect the 
SoPEC design. Note that if the LEON caches are not used then we may revert to vl .0.7 of the database as 
the cache control logic is likely to be simpler and easier to optimise away (vl.0.8 introduced support for 
set^associative caching) 

The LEON core will be clocked using the system clock, pc/A:, and reset using the prstjn^sectionfl] signal. 
The ICU will assert all the hardware interrupts using the protocol described in section 1 1.9. The particular 
types of SRAMs (for LEON caches) and register files used will be determined during the implementation 
phase. The LEON hardware multipliers are notexpected to be required. Furthermore it is anticipated that 
SoPEC will \ise the recommended 8 register window configuration 

Further details of the SPARC V8 instruction set and the LEON processor can be found in (32] and [33] 
respectively. 

1 1 *6 Memory Management Unit (MMU) 

Memory Management Units are typically used to protect certain regions of memory from invalid accesses, 
to perform address translation for a virtual memory system and to maintain memory page status (swapped- 
in, swapped-out or urmiapped) 

The SoPEC MMU is a much simpler affair whose function is to ensure that all regions of the SoPEC mem- 
ory map are adequately protected. The MMU does not support virtual memory and physical addresses arc 
used at all times - the one exception to this is the address translation of the reset vector. The SoPEC MMU 
supports a full 32-bit address space. A proposed memory map is shown in Figure 18 below. 

The MMU selects the relevant bus protocol and generate the appropriate control signals depending on the 
area of memory being accessed. The MMU is responsible for performing the address decode and genera- 
tion of the appropriate block select signal as well as the selection of the correct block read bus during a 
read access. The MMU will need to support all of the bus transactions the CPU can produce including 
interrupt acknowledge cycles, aborted transactions etc. 

When an MMU error occurs (such as an attempt to access a supervisor mode only region when in user 
mode) a bus error is generated. While the LEON can recognise different types of bus enror (e.g. data store 
error, instruction access error) it appears to handle them in the same manner as it handles all tn^s i.e it will 
transfer control to a tr^ handler. No extra state information appears to be stored because of the nature of 
the trap.The location of the trap handler is contained in the TBR (Trap Base Register). This is the same 
mechanism as is used to handle interrupts. Further investigation is needed to determine exactly how LEON 
behaves when a bus error type trap occurs to determine the best approach to handling bus errors. It may be 
simplest to just treat them as the highest priority interrupt 
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i3 



Accesses in this 
area are not 
allowed and 
result in a bus 
enx)r exception. 



Accesses in this 
area are via the 
CPU bus and are 
controlled by 
permissions set in ' 
each peripheral. 



Accesses in this 
area are via the 
DIU bus and are 
controlled by 
permissions set In^ 
the MMU. 




OxFFFF.FFFF 



PCU Mapped Registers 



Peripheral Registers 



ROM 
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OxOO2A^C0OO 
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0x0029.0000 
0x0028^0000 




ORAM 
Regions 



0x0000.0000 



Figure 18. Proposed SoPEC CPU memory map (not to scale) 

1 1 .6.1 CPU-bus peripherals address map 

The address mapping for the peripherals attached to the CPU-bus is shown in Table 16 below. The MMU 
performs the decode of the high order bits to generate the relevant cpujblock select signal. Apart from the 
PCU, which decodes the address space for the PEP blocks, each block only needs to decode as many bits 
of cpu_adr[l 1:2] as required to address all the registers within the block. 

Table 16. CPU-bus peripherals address map 



MMU_base 


0x0029.0000 


TIM_base 


0x0029.1000 


LSS_base 


0x0029.2000 


GPIO^ljaso 


0x0029.3000 


SCB.base 


0x0029^4000 


ICU.base 


0x0029.5000 


CPR^base 


0x0029.6000 


ROM_baso 


0x0029.7000 


DIU_base 


0X0029.8CXX) 


PSS_base 


0x0029.9000 
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Table 16. CPU-bus peripherals address map 







Reserved 


0x0029^^000 to 0X0029.FFFF 


PCU.base 


0xO02A^0O0O 



11.6^ DRAM Region Mapping 

The embedded DRAM is broken into 8 regions^ with each region defined by a lower and upper bound 
address and with its own access permissions. 

The association of an area in the DRAM address space with a MMU region is completely under software 
control. Table 17 below gives one possible region mapping. Regions should be defined according to their 
access requirements and position in memory. Regions that share the same access requirements and that are 
contiguous in memory may be combined into a single region. The example below is purely for indicative 
purposes - real mappings are likely to differ significantly from this. Note that the RegionBottom and Regi- 
onTop fields in this example are byte aligned and would need to be right-shifted by 5 places to obtain the 
256-bit aligned value used to program the RegionNTop and RegionNBottom registers, or more details see 
11.6.5.1 and 11.6,5.2. 



Table 17. Example region mapping 









0 


0x0000.0000 


OxOOOO_OFFF 


Silverbrook OS (supervrsor) data 


1 


0x0000.1000 


0x0O0O_BFFF 


Silvertorook OS (supervisor) code 


2 


OxOOOO^COOO 


0x0000_C3FF 


Sitverbrook (supervisoi/user) data 


3 


0x0000_C400 


OxOOOO_CFFF 


Sitverbrook (supervisorAiser) code 


4 


OX0026.DOOO 


0x0026.D3FF 


OEM (user) data 


5 


0x0026.0400 


0X0026.DFFF 


OEM (user) code 


6 


Ox0027_EOOO 


0x0027_FFFF 


Shared Silvert>rook/OEM $pace 


7 


OxOOOO.OOOO 


Ox0026_CFFF 


Compreesed page store (supervisor data) 



1 1 .6.3 Non-DRAM regions 

As shown in Figure 18 the DRAM occi^ies only 2.5 MBytes of the total 4 GB SoPEC address space. The 
non-DRAM regions of SoPEC are handled by the MMU as follows: 

ROM (0x0028.0000 to Ox0028_FFFF): The ROM block will control the access types allowed. The 
cpujxcode[l:0] signals will indicate the CPU mode and access type and the ROM block will assert 
romjcpujberr if an attempted access is forbidden. The protocol is described in more detail in section 
1 1.4.3. The ROM block access permissions are hard wired to allow all read accesses except to the Fiise- 
ChipID registers which may only be read in supervisor mode. 

MMU Internal Registers (0x0029_0000 to 0x0029_0FFF): The MMU is responsible for controlling the 
accesses to its own internal registers and will only allow data reads and writes (no instruction fetches) 
from supervisor data space. All other accesses will result in the mmu^cpttjberr signal being asserted in 
accordance with the CPU native bus protocol. 

CPU Subsystem Peripheral Registers (0x0029_1000 to ax0029_FFFF): Each peripheral block will 
control the access types allowed. Every peripheral will allow supervisor dau accesses (both read and 
write) and some blocks (e.g. Timers and GPIO) will also allow user data space accesses as outlined in the 
relevant chapters of this specification. Neither supervisor nor user instruction fetch accesses are allowed to 
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any block as it is not possible to execute code from peripheral registers. The bus protocol is described in 
section 11.4.3. 

PCU Mapped Registers (0x002 A_0000 to 0x002 A_BFFF): All of the PEP blocks registers which are 
accessed by the CPU via the PCU will inherit the access permissions of the PCU. These access pennis- 
sions are hard wired to aUow supervisor data accesses only and the protocol used is the same as for the 
CPU peripherals. 

Unused address space (OxOO2A_C0O0 to OxFFFF^FFFF): All accesses to the unused portion of the 
address space will result in the mmu^cpujberr signal being asserted in accordance with the CPU native 
bus protocol. These accesses will not propagate outside of the MMU i.e. no external access will be initi- 
ated. 

1 1 .6.4 Reset exception vector and reference zero traps 

When a reset occurs the LEON processor starts executing code from address OxOOOO^OOOD. On SoPEC the 
embedded DRAM occupies this area of the address map. As the DRAM contents are undefined when the 
processor comes out of reset (this is certainly the case with a power-on and most other resets that can occur 
on SoPEC) the MMU will need to redirect accesses from OxOOOO_0000 through OxOOOO_00?? (the mini- 
mum amoimt of redirection is cxurently TBD but is likely to be at least 16 bytes) to the bottom of the ROM 
i.e. to 0x0028_0000 through 0x0028_00??. 

A common software bug is zero-referencing or null pointer de-referencing (where the program attempts to 
access the contents of address OxOOOO_0000). To assist software debug the MMU will assert a bus error 
every time the reset locations are accessed after the reset trap handler has legitimately been retrieved 
inunediately after reset. If desired this condition could be result in a unique trap (e.g. a watchpoint 
detected trap) 

1 1 .6.5 MMU Configuration Registers 

These are the only configuration registers in the CPU block. Note that all the MMU configuration registers 
may only be accessed when the CPU is running in supervisor mode. 



Table 18. MMU Configuration Registers 







[il 




ilMSiWiliii 


0x00 


RegionOBottom 


17 


OxO.OOOO 


This register contains the physical address that 
marks the bottom of region 0 


0x04 


RegionOTop 


17 


OxF^FFFF 


This register contains the physical address that 
marks the top of region 0. Region 0 covers the 
entire address space after reset whereas ail 
other regions are zero^sized initially. 


0x08 


Region 1 Bottom 


17 


OxO_0000 


This register contains the physical address that 
marks the bottom of region 1 


OxOC 


Region ITop 


17 


0x0.0000 


This register contains the physical address that 
marks the top of region 1 


0x10 


RegIon2Bottom 


17 


Ox0_0000 


This register contains the physical address that 
marks the bottom of region 2 


0x14 


RegtonSTop 


17 


OxO_0000 


This register contains the physical address that 
marks the top of regk^n 2 


0x18 


RegionSBottom 


17 


0x0^0000 


This register contains the physical address ttuit 
n^rks the bottom of region 3 


0x1 C 


RegionSTop 


17 


0x0_000O 


This register contains the physical address that 
marks the top of regton 3 
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Table 18. MMU Configuration Registers 













0x20 


Region4Bottom 


17 


0x0.0000 


This register contains the physical address that 
marks the bottom of region 4 


0x24 


Re9ion4Top 


17 


0x0.0000 


This register contains the physical address that 
naarlcs the top of region 4 


0x28 


Region5Bottom 


17 


0x0.0000 


This register contains the physical address that 

iiKiiiva uitj wivoTvi oi reQiOii 9 


0x2C 


RogionSTbp 


■|7 




1 nis register contains tne iM^ysicai address that 
marks the top of region 5 


0x30 


ReglonOBottom 


17 


0x0.0000 


This register contains the physical address that 
marks the bottom of regton 6 


0x34 


Region6Top 


17 


0x0.0000 


This register contains the physical address that 
marks the top of region 6 


0x38 


R6gion7B<mom 


17 


0x0.0000 


This register contains the physical address that 
marks the bottom of region 7 


Ox3C 


RegionTTop 


17 


0x0_0000 


This register contains the physical address that 
marks the top of region 7 


0x40 


RegionOControl 


6 


0x07 


Control register for region 0 


0x44 


Regioni Control 


6 


0x07 


Control register for regfon 1 


0x48 


Reglon2Control 


6 


0x07 


Control register for ragfon 2 


0x4C 


RegionSControl 


6 


0x07 


Control register for region 3 


OxSO 


Region4Control 


6 


0x07 


Control register for region 4 


0x54 


RegfonSControl 


6 


0x07 


Control register for region 5 


0x58 


RegionSControl 


6 


0x07 


Control register for region 6 


0X5C 


RegtonTControl 


6 


0x07 


Control register for regton 7 


0x60 . 


BusTlmeout 


16 


OxOOFF 


This register shoutd be set to the number of pc!k 
cycles to wait before aborting an access with a 
bus error. 


0x64 


DebugSelect 


7 


0x00 


Contains address of the register selected for 
debug observatton. It is expected that a number 
of pseudo-registers will be made available for 
debug observation and these will be outlined 
during the implementation phase. 



i1.6.5,i RegionTop and RegionBottom registers 

The 20 Mbit of embedded DRAM on SoPEC is arranged as 81920 words of 256 bits each. All region 
boundaries need to align with a 256-bit word Thus only 17 bits are required for the RegionNTop and 
RegionNBottom registers. The byte address of these locations can be obtained by simply left-shifting the 
register value by 5 bits i.e. cpu_adr[21:0J = RegionNTop/Bottom[16:0] « 5. 

Both the RegiofxNTop and RegionNBottom registers are inclusive i.e. the addresses in the registers are 
included in the region. The size of smallest active region is therefore 2 2S6-bit words i.e. 64 bytes. 

If DRAM regions overlap (there is no reason for this to be the case but there is nothing to prohibit it either) 
then only accesses allowed by all overlapping regions are permitted. That is if a DRAM address appears in 
both Regioni and Region3 (for example) the cpujacode of an access is checked against the access permis- 
sions of both regions. If both regions permit the access then it will proceed but if either or both regions do 
' not permit the access then it v^U not be allowed. 
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The MMU does not support negatively sized regions i.e. the value of the RegionNTop register should 
always be greater that the value of the RegionNBottom register. If RegionNTop is lower in the address map 
than RegionNTop then the region is considered to be zero-sized and is ignored. 

When both the RegionNTop and RegionNBottom registers for a region contain the same value the region is 
then simply one 256-bit word in length and this corresponds to the smallest possible active regioiL 

11.6.5.2 Region Controf registers 

Each memory region has a control register associated with it. The RegionNControl register is used to set 
the access conditions for the memory region bounded by the RegionNTop and RegionNBottom registers. 
Table 1 9 describes the function of each bit field in the RegionNControl registers. All bits in a RegionNCon- 
trol register are both readable and writable by design. However, like ail registers in the MMU, the 
RegionNControl registers can only be accessed by code running in supervisor mode. 



Table 19. Region Control Register 









SupervisorAccess 


2:0 


Denotes the type of access allowed when the CPU Is running in 
Supervisor mode. For each access type a 1 indicates the access is 
permitted and a 0 indrcates the access Is not permitted, 
bito • Data read access permission 
bill - Data write access permission 
bit2 - Instruction fetch access permission 


UserAccess 


5:3 


Denotes the type of access allowed when the CPU is mnning in 
User mode. For each access type a 1 Indicates the access is per- 
mitted and a 0 indicates the access Is not permitted. 
bit3 - Data read access permission 
bit4 - Data write access permission 
bits - Instruction fetch access pemtisslon 



11.6.5.3 Status Register 

The SPARC V8 architecture allows for a number of types of memory access error to be trapped. These trap 
types and trap handling in general are described in chapter 7 of the SPARC architecture manual [32]. 
According to the SPARC architecture manual the processor will automatically move to the next register 
window (i.e. it decrements the current window pointer) and copies the program counters (PC and nPC) to 
two local registers in the new window. The supervisor bit in the PSR is also set and the PSR can be saved 
to another local register by the trap handler (this does not happen automatically in hardware). 

At the time of writing it is not clear whether the LEON core can easily accept memory access error trap 
types (i.e. the 8-bit tt field of the Trap Base register). Further investigation is needed to determine it this is 
possible and if existing trap types will cover the different types of bus error possible on SoPEC. Up to 32 
implementation specific trap types are allowed so conditions unique to SoPEC can be handled in this man- 
ner. 

If it is not possible for sufficient information about the cause of the bus error to be passed to the LEON 
core using die above mechanisms then a status register will be implemented to record the relevant informa- 
tion. 

1 1 .6.6 MMU Sub-block partition 

As can be seen from Figure 19 and Figure 20 the MMU consists of five principal sub-blocks. For clarity 
the connections between these sub-blocks and other SoPEC blocks and between each of the sub-blocks are 
shown in two separate diagrams. 
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Figure 19. MMU Sub-block parUtfoh, external signal view 



Doc: SoPEC.hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2CX}2 
Page 79 



SoPEC : Hardware Design 




LEON 
Bridge 



cpu_start.acc8ss 



cpu ben^t:0] 



cptJ_rwn 



cpu_mmu_adf(3 1 



cpu_dataotJt{31 X>] 



mmu_cpu_data(31 :0] 
4 



mmu_cpu_rdv 



cmi^mmu_acode( t :gl 



^ mfnu,cpu_berT 



CPU tecK 



ICache 



! 



ic^cache_hit 



cpu_adr[21:0] 



dram_access_en 



dram^rdy 



MMU 

Control 

Block 



Configuration 
Registers 



{21 :i 



Locess.en Z 



per l_mmu^data[3 1 :0] 



peri_fnmu_rdy 



peri^mmu_befr 



DIU 

Bus 

Interface 



dram_data(3l:0] 



CPU 

Subsystem 
Bus 

Interface 



RDU 



Figure 20. MMU Sub-block partition, internal signal view 



11,6,6.1 LEON Bridge 

At the time of writing it is expected that the LEON core will be used with its AHB interface rather than be 
modified to comply with the protocols used on SoPEC, in particular the DIU protocol for DRAM access. 
The LEON bridge consists of an AHB bridge and some glue logic. The AHB bridge will convert between 
the AHB and the DIU and CPU subsystem bus protocols. The AHB bridge will always be a slave on the 
AHB. Glue logic will be required to assist with endianness coherency, intemipts and other miscellaneous 
signalling. 



Table 20- LEON bridge l/Os 











Glotial SoPEC signals 


prst_n 


1 


In 


Global reset Synchronous to pcK; active low. 


pclk 


1 


rn 


Global dock 


LEON Bridge to AHB signats 


haddr(31:0] 


32 


In 


AHB address bus 


hwdata[31:0] 


32 


In 


AHB write data bus 


hrdata[31:0] 


32 


Out 


AHB read data bus 


hsel 


1 


In 


AHB slave select signal 
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Tabre 20. LEON bridge VOs 











hwrite 


1 


In 


AHB write signal: 
1 * Write access 
0 - Read access 


htrans 


2 


In 


Indicates the type of the current transfer: 

00 -IDLE 

01 • BUSY 

10 - NONSEQ 

11-SEQ 


hslze 


3 


In 


Indicates the size of the current transfer: 

000 - Byte transfer 

001 - Halfword transfer 
010* Word transfer 

01 1 • 64*bit transfer (unsupported?) 
Ixx - Unsupported larger wordstzes 


h burst 


3 


In 


indicates if the current transfer forms part of a burst and the type of 
burst 

000 • SINGLE 

001 - INCR 

010 - WRAP4 

011 -1NCR4 
too -WRAPS 
101 - iNCR8 
110-WRAP16 
111 -1NCR16 


hprot 


4 


In 


Protection control signals pertaining to the current access: 
hprolfO] - Opcode(O) / Oata(1) access 
hprot[l J - U8er(0) / Supervisor access 

hprot[2] - Non-bufferable(O) / Butferable(l) access (unsupported) 
hprot(3] ' Non-cacheable(O) / Cacheable attess 


hmaster 


4 


In 


Indicates the identity of the cun^ent bus master. This will always be 
the LEON core. 


hmastlock 


1 


In 


Indicates that the current master Is performing a locked sequence 
of transfers. 


h ready 


1 


Out 


Active high ready signal indicating the access has completed 


hresp 


2 


Out 


Indicates the status of the transfer: 
00 -OKAY 
01 - ERROR 
10 -RETRY 
11 -SPUT 


hspiit 


16 


Out 


This 16-bit split bus is used by a stave to indicate to the arbiter 
which bus masters should be allowed attempt a split transaction. 
This feature wiO be unsupported on the AHB bridge 


TopleveU Common LEON bridge signals 


cpu_dataout(31:0] 


32 


Out 


Data out bus to both DRAM and.peripheral devices. 


cpu_fwn 


1 


Out 


Read/NotWrite signal. 1 s Current access Is a read access. 0 » 
Current access Is a write access 


Icu_cpu.nevel[3:0] 


4 


In 


An interrupt Is asserted by driving the appropriate priority level on 
icu^cpujlevei These sigrials must remain asserted until the CPU 
executes an interrupt acknowledge cycle. 


cpu^tcu Jievel[3 :0) 


4 


In 


Indicates the level of the interrupt the CPU is acknowledging when 
cpuJackX^ high 
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Table 20. LEON bridge l/Os 





mi 






cpu.iack 


1 


Out 


Interrupt acknowledge signal. The exact timing depends on the 
CPU core imptementaUon 


q^u^start.accesd 


1 


Out 


Start Access signaJ indicating the start of a data transfer and that 
the cpu_adr, cpu_dataout, cpu^rwn and cpu_acode signals are all 
valid. This signal is only asserted during the first cyde of an access. 


cpu_ben[1 rOJ 


2 


Out 


Byte enable signals. 


LEON core to LEON bridge signals 


iULiri 


4 


Out 


interrupt level request to the LEON Integer Unit 


iuoJrl 


4 


In 


Acknowledged interrupt level from the LEON Integer Unit 


Kjo.tntack 


1 


in 


Interrupt acknowledge tignal from the LEON Integer Unit 


LEON bridge to MMU Control Block signals 


cpu_mmu_adr 


32 


Out 


CPU Address Bus. 


mmu^cpu.data 


32 


In 


Data bus from the MMU 


mmu_cpu_rdy 


1 


tn 


Ready signal from the MMU 


cpu_mmu_acode 


2 


Out 


Access code signals to the MMU 


mmu_cpu_be rr 


1 


In 


Bus error signal from the MMU 



Description: 

The LEON bridge must ensure that all CPU bus and interrupt transactions are functionally correct and that 
the timing requirements are met This sub-block is also responsible for ensuring endianness coherency i.e. 
guaranteeing that the correct data appears in the correct position on the data buses (hrdata, cpu_dataout 
and mmu^cpu_data) for every type of access. This is a requirement because the LEON uses big-endian 
addressing while the rest of SoPEC is little-endian. 

It is expected that some signals (especially those external to the CPU block) will need to be registered here 
to meet the timing requirements. Careful thought will be required to ensure that overall CPU access times 
are not excessively degraded by the use of too many register stages. 

11.6.6.2 DIU Bus interface 

The DIU bus interface will handle all valid accesses to the embedded DRAM via the DIU. The DIU bus 
interface ensures that the access conforms to the DIU bus protocol while the DIU manages the arbitration 
and data alignment. 



Table 21. OIU Bus Interface l/Os 











Global SoPEC signals 


prst_n 


1 


In 


Global reset. Synchronous to pc/k, active low. 


pcik 


1 


In 


Global dock 


Toplevet/Common DIU Bus Interface signals 


dram_cpu_data[255:0] 


256 


In 


Read data from the DRAM. 


cpu_diu_rreq 


1 


Out 


Read request to the DIU DRAM 


dIu.cpu.rBCk 


1 


In 


Acknowledge from DIU that read request has been accepted. 


diu.cpu_n/alid 


1 


In 


Signal from DIU indicating that valkl read data is on the 
dramjcpujdata bus 


cpu_dlu.wreq 


1 


Out 


Write request to the DIU 
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Table 21. DIU Bus Interface UOs 













d^u.cpu^wack 


1 


In 


/v^uiowieage irom me uiu tnai tne wnte request has t>een 
accepted 


• 


cpu_diu_wvalicl 


1 


Out 


Signal from the CPU to the DIU indicating that the data cunentJy on 
the cpu.dCafao(/f Knjs Is valid 


1 

1 


cpu_diu_wmask[1 :0] 


2 


Out 


Rag Indicating termat of CPU write to DRAM. These signals are 

directly derived from the cpu_ben signals 

cpu^diu^wmask 3 OO: 8-blt write 

cpujdiujmnask s 01 : 1 e-bit write 

cpu^diu^wmask- 10: 32-bit write 

cpu^diu^wmasksi 11: reserved 

cpu.adft2:0] are driven in accordance with the width of the data 
access indicated by cpu_diu_wmas)c Addresses cannot cross a 
256-btt word DRAM boundary. 


1 

1 


dram_fdy 


1 


Out 


Data Ready signal. Indicates the data on the dram_cpujcSata bus is 
valid for a read cycle or that the data was successfully dispatched 
to the DIU fbr a write cyde. 




DIU Bus Interface to MMU Control Block signals 


1 


cpu_adrt21 :0] 


22 


In 


Toplevel CPU Address bus. 


1 


dram.data(31.i)] 


32 


Out 


Data bus containing the 32 bits addressed by cpu_ad/f4:2]ffom the 
256-blt DRAM read bus dram^cpujdata 




dfanuaocess.en 


1 


In 


Enable Access signal. A DRAM access cannot be Initiated unless it 
has been enabled by the MMU Control Unit 




DIU Bus Interface to ICaehe signals 




ic_cache_hit 


1 


In 


Cache hit signal from the ICache. This Indicates ttiat the current 
CPU read request is being serviced by the ICache and so should 
not be retrieved from the DRAM. 


1 


DIU Bus Interface to LEON bridge signals 


1 


c|ni_ben[1:0] 


2 


In 


Byte enable signals from the LEON bridge. These are forwarded on 
to the DIU as the cpu_diu^wmask signals 


1 


cpu.start_aocess 


1 


In 


Start Access signal from the LEON bridge indicating the start of a 
data transfer and that the cpu^adr, cpujdataouU cpu_iwn and 
cpujaoode signals are all valid. This signal is only asserted during 
ttie first cyde of an access. 



Description: 

The DIU Bus Interface handles all data transfers between the CPU (or ICache) and the DIU. This involves 
translating between the different protocols used on the DIU and CPU buses. The validity (i.e. is the CPU 
running in the correct mode for the address space being accessed) of an access is determined by the MNfU 
Control Block which also checks that a DRAM access docs not cross a 256-bit boundary (as required by 
the DIU) and the dram^accessjen is asserted if it is a valid access. Invalid accesses do not initiate DRAM 
accesses. The operation of the DIU Bus Interface is described by the sute machine shown in Figure 21 and 
the DIU bus protocol is described in more detail in section 20.9. The DIU will return a 256-bit dataword 
on dram_cpu_data[255:0] for every read access. The DIU Bus Interface must select the appropriate 32-bit 
word from this according to the word address given by cpu_adr[4:2J. 
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Pf St n i^w O 

cpu_diu_rreq « 0 
cpu^dlu^wreq = 0 
cpu_dhj_wvand = 0 

drajn_rdy « 0 



CPU Start accgsa 



mi 

AND dram aocess an 



Start access 



lu start access «a1 
dram access en 1 



ANPte cachfl hit='l 



drsm.c ita 



CPU start accftss 

AND dram i>n s= 1 

ANDic cache hit ^ Q 
AND CPU fwn «m 1 

cpujcliu.rreq » 1 




/Read AccessN 
Initiated J 



diu cpu rackg=1 
cpu^dlu jrroq « 0 



configura data select nruxas 
according to cpu.adrf4:2} 



/Read AccessN 
^Acknowledge^ 



diti ftpni ivafid 1 
> dram-Cpu!!Sa&!!^iv31] 



dlu cpu rvatld 

dram^fdy = O 



/Read AccessN 
^^^^Complete J 



cpu start access i 
AND dram access en = 1 
ANDic cache hit^^O 

AMP CPU fWP = Q 
cpu_diu_wreq a \ 



/Write AccessN 
\^ Initiated y 



dhi CPU iwaekc=1 
cpujdiu.wreq » 0 



/write AcccssN 
VAcknowledge^ 



dhi gpti wvattd 1 

dfam„rdy » 1 



/write AccessN 
Complete^ 



,^gg^!!!!>W.«u wvalfd-^Q 
dram.jdy«0 



Figure 21. DIU Bus Interface state machrne 



11.6.6.3 CPU Subsystem Bus Interface 

The CPU Subsystem Interface block handles all valid accesses to the peripheral blocks that comprise the 
CPU Subsystem. 



Table 22. CPU Subsystem Bus Interface l/Os 







t« 




Global SoPEC signals 




1 


In 


Global reset. Synchronous to pctfr. active low. 


pclk 


1 


In 


Global dock 


Topievel/Common CPU Subsystem Bus Interface signals 


cpu_cpr_sel 


1 


Out 


CPR block select 


cpu_gpio__sel 


1 


Out 


GPtO block select. 


cpu_rcu_sel 


1 


Out 


ICU bkx:k select 


cpujss^sel 


1 


Out 


L^S block select. 


cpu_pcu_sel 


1 


Out 


PCU block select. 
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Table 22. CPU Subsystem Bus Interface l/Os 











cpu.scb.sel 




Out 


SCB block select. 


cpu_tlm_8el 


1 


Out 


Timers block select. 


cpu_rom_sel 


1 


Out 


ROM block select. 


CpU_J}SS_S8l 




Out 


PSS block select. 


cpu_diu,sef 




Out 


DIU block select 


cpf_cpu_data(31 :0] 


32 


In 


Read data bus from the CPR block 


gpio_cpu_data[31 :0] 


32 


In 


Read data bus from the QPIO bfock 


lcu_cpu.data(31 :0] 


32 


In 


Read data bus from the ICU btock 


Is8.cpu_data[31:0] 


32 


In 


Read data bus from the LSS block 


pcu_cpu_data(31 K)) 


32 


In 


Read data bus from the PCU bk>ck 


8cb_cpu_data[31 :0] 


32 


In 


Read data bus from the SCB block 


tim_cpu.data(31 :0] 


32 


In 


Read data bus from the Timers block 


rom_cpu_data[31 :01 


32 


In 


Read data bus from the ROM block 


pss.cpu_data(3t :0} 


32 


In 


Read data bus from the PSS block 


dtu_cpu.data(31 :0] 


32 


In 


Read data bus from the DIU bfock 


cpr.cpu.rdy 


1 


In 


Ready signal to the CPU. When ^r_cpu_rdy is high it Indicates the 
last cycle of the access. For a write cyde this means cpu^dataout 
has been registered by the CPR Wock and lor a read cycte this 


opto CPU rdy 


-^j 


In 


GPlO readv sianal tn thA f^Pli 


Icu_cpu_rdy 




In 


ICU ready ^gnal to the CPU. 


lss_cpu_rdy 




In 


LSS ready signal to the CPU. 


pcu_cpu_rdy 




In 


PCU ready signal to the CPU. 


scb_cpu_rdy 




In 


SCB readv sianal to the CPU 


tfm_cpu_rdy 




tn 


Timers block ready signal to the CPU. 


rom_cpu_rdy 


^1 


In 


ROM block readv sianal to the CPU 


pss_cpu_rdy 




In 


PSS block ready signal to the CPU. 


dlu_cpu_rdy 




In 


DIU register btock ready signal to the CPU. 


Cpr_cpu_berr 




In 


Bus Error signal from the CPR block 


gpio.cpu.berr 




In 


Bus Error signal from the GPtO block 


teu_cpu_b€rr 




tn 


Bus Error signal from the IQU block 


lss_cpu_berr 




(n 


Bus Error signal from the LSS Wock 


pcu.^Hi.berr 




In 


Bus Error signal from the PCU bkxic 


scb.cpu.berr 




In 


Bus Error signal from the SCB bk>ck 


tim_cpu_berT 




In 


Bus Error signal from the Timers block 


rom_cpu_berT 




In 


Bus Error signal froni the ROM bk>ck 


p3s_cpu_berr 




In 


Bus Error signal from the PSS block 


diu_cpu_befr 




in 


Bus Error signal from the DIU block 


CPU Subsystem Bus Interface to MMU Control Block signals 


cpu.adr[19:12] 


6 


In 


Toplevei CPU Address bus. Only bits 19-12 are required to decode 
the peripherals address space 


peri^access.en 


1 


In 


Enat>le Access signal. A peripheral access cannot be initiated 
unless it has been enabled t>y the MMU Control Unit 
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Table 22. CPU Subsystem Bus interface l/Os 







SIM 




peri.mmu jdata[31 :0] 


32 


Out 


Data bus from the selected peripheral 


peri_mniu_rdy 


1 


Out 


Data Ready signal. Indicates the data on the perijmmujdata bus is 
valid tor a read cyde or that the data was successfully written to the 
peripheral for a write cycle. 


peri_mmu_berr 


1 


Out 


Bus Error signal. Indicates a bus error has occurred in accessing 
the selected peripheral 


CPU Subsystem Bus Interface to LEON bridge signals 


cpu_stan_access 


1 


(n 


Start Access signal from the LEON bridge Indicating the start of a 
data transfer and that the cpu^adr, cpujdataout, cpu^rwn and 
cpu_acode signals are all valid. This signal is only asserted during 
the first cyde of an access. 



Description: 

The CPU Subsystem Bus Interface block performs simple address decoding to select a peripheral and mul- 
tiplexing of the returned signals from the various peripheral blocks. The base addresses used for the 
decode operation are defined in Table 16. Note that access to the MMU configuration register are handled 
by the MMU Control Block rather than the CPU Subsystem Bus Interface block. The CPU Subsystem Bus 
Interface block operation is described by the following pseudocode: 

xaaske<^cpu_adr = cpu^adr (19 : 12] 
case {nia5ked_c:pu_adr> 
when TIM_baso(19:12] 

cpu_tiausel = peri_access_en // The peri_ekccess_en signal will have the 

peri_mnu_data = tiin_cpu_data // timing required for block selects 

perijnmu^rdy » tim_cpu_rdy 

perijrenu_berr = tixn^cpu^berr 

all_other_selects * 0 // Shorthand to ensure other cpu_blocK_sel signals 

// remain deasserted 

when L.SS_base(19:12J 

cpu_lss_sel « peri_access_en 

perijnma_datQ = lss_cpu_data 

peri_|iTOu_rdy 3 lss^cpu_rdy 

peri_iifnu_berr = lss_cpujberr 

«ll_ot:her_selects = 0 
when GPIO_base(19tl2) 

cpu_gpio_sel = peri_access_en 

peri_itTOu_d«ta « gpio_cpu_data 

peri_jnmu_rdy = gpio_cpu_rdy 

peri.jnrau_berr = gpio_cpu_berr 

all_other_selects = 0 
when SCB_base{19:121 

cpu_scb_sel = peri_access_en 

peri_rrenu_data = scb_cpu_data 

peri_ptmu_rdy = scb_cpu_rdy 

perijnmu^berr = 8cb_cpu_berr 

all^other^selects = 0 
when ICUjMi8e(19:12) 

cpu_icu_sel = peri_access_en 

perijnmu_data = icu_cpu_data 

perijnmu_rdy = icu_cpu_rdy 

perijrRiuJt>err = icu^cpu^berr 

alI_other_selects 0 
when CPR^ba3eU9:12] 

Cpu_cpr_sol = peri_access_en 

per i_mnu_dat a = cpr_cpu_data 
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peri_mmu^_rdy « cpr_cpu^r<^ 
peri_inniu_berr = cpr_cpu_berr 
all^other_s elects » 0 

when RQH.base[19:12] 

cpu.roncsel = peri.access.en 
peri..jnmu.data - ronc.cpUL.data 
peri^mmu^rdy = roin_.cpu.rdy 
peri_pimu^berr s rooucpujberr 
aIl_other..8elects s o 

when P5SJba8e[19:12] 

cpu^8s.sel a perl_acces8.en 
peri_^u.data = pss_cpu.claca 
perl.jiiiiu^rdy « ps8_cpu_rdy 
perl.piniu.berr « p8s_cpujberr 
aIl_other.selec.ts = 0 

when DZU.base(19:12] 

cpu_diu.8el s perl.access_en 
perl.inrau.data = dlu_cpu.data 
perljnimi.r<3^ = diu_cpu.rdy 
perijnmu«berr = diu.cpu.berr 
all.other.selects = 0 

when PCU_ba8e(19:12) 

cpu.dlu.8el 3 perl.access.en 
peri^nmu^data = pcu.cpu.data 
perijnnu.rdy « pcu.cpu.rdy 
perijnmu_berr = pcu_cpu.berr 
all.oth6r.select8 = 0 

when others 

I all.block.select8 ^ 0 

peri.pimu.data = 0x00000000 
peri jnmu.rdy » 0 
perijnnujt>err = 1 
end case 



MMU Control Biock 

The MMU Control Block detennmes whether every CPU access is a valid access. No more than one cycle 
is to be consumed in detennining the validity of an access and all accesses must terminate with the asser- 
tion of either mmu_cpu_rdy or mmu^cpujberr. To safeguard against stalling the CPU a simple bus timeout 
mechanism will be supported. 



I Table 23. MMU Control Block l/Os 









Global SoPEC sfgnals 


prst_n 


1 


In 


Global reset Synchronous to pclk^ active low. 


pcfk 


1 


In 


Gk>baldock 


Toplevel/Common MMU Control Block signals 


cpu_adr(21:01 


22 


Out 


Address bus for both DRAM and peripheral access. 


cpu.aoode(1 :0] 


2 


Out 


CPU access code signals {cpu^mmu^acode) retimed to meet the 
CPU Subsystem Bus timing requirements 


dram.access.en 


1 


Out 


ORAM Access Enable signal. Indicates that the current CPU 
access is a valid DRAM access. 


MMU Control Block to LEON bridge signals 
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Tabte 23. MMU Control Block 1/Os 











cpu_fnmu_adr(31 :0] 


32 


in 


CPU core address bus. 


cpu.dataout(31:0| 


32 


In 


Toplevel CPU data bus 


inmu_cpu_data{31 :0) 


32 


Out 


Data bus to the CPU core. Carries the data for all CPU read opara- 
trons 


cpu_rwn 


1 


In 


Toplevet CPU Read/notWrite signal. 


cpu^mmu_acodel1 lOj 


2 


In 


CPU access code signals 


mmu_cpu_rdy 


1 


Out 


Ready signal to the CPU core. Indicates the completk>n of all valid 
CPU accesses. 


mnuj_cpu_berr 


1 


Out 


Bus Error signal to the CPU core. This signal is asserted to termi* 
nate an invalid access. 


cpu_$tart_acces5 


1 


In 


Start Access signal from the LEON bridge indicating the start of a 
data transfer and that the cpu_^adr, cpu^dataout, qf>u_,rwn and 
cpu^acode signals are all valid. This signal is only asserted during 
the first cycle of an access. 


cpujack 


1 


In 


Interrupt Acknowledge signal from the CPU. This signal Is only 
asserted during an interrupt acknowledge cycle. 


cpu.ben[1:0] 


2 


In 


Byte enable signals indicating which bytes of the 32-blt bus are 
being accessed. 


MMU Control Block to OIU Bus Interface signals 


dram_rdy 


1 


In 


Data Ready signal. Indicates the data on the dmm_cpu_data bus is 
valid for a read cycle or that the data was successfully dispatched 
to the DIU ibr a write cyde. 


MMU Control Block to ICache signals 


k;.data[31.'0] 


32 


tn 


Data bus fiom the ICache 


ic^rdy 


1 


in 


Ready signal from the ICache indicating the data on ic_<fata is valid 


MMU Control Block to CPU Subsystem Bus Interface signals 


perl_access_en 


1 


Out 


Enable Access signal. A peripheral access cannot be initiated 
unless It has been enabled t^ the MMU Control Unit 


peri_mrnu_data[31 rO] 


32 


In 


Data bus from the selected peripheral 


peri^mmu.fdy 


1 


In 


Data Ready signal. Indicates the data on the peri_fnmu_data bus is 
valid for a read cyde or that the data was successfully written to the 
peripheral for a write cycle. 


pefi_mmu_berr 


1 


In 


Bus Error sigrutl. Indicates a t)us error has occurred in accessing 
the selected peripheral 



Description: 

I The MMU Control Block is responsible for the MMU's core functionality, namely determining whether or 

not an access to any part of the address map is valid. An access is considered valid if it is to a mapped area 
of the address space and if the CPU is running in the appropriate mode for that address space. Furthermore 
the MMU control block must coiiectly handle the special cases that are: an interrupt aclaiowledge cycle, a 
reset exception vector fetch, an access that crosses a 256-bit DRAM word boundary and a bus timeout 
condition. The following pseudocode shows the logic required to implement the MMU Control Block 

I functionality. It does not deal with the timing relationships of the various signals - it is the designer*s 

responsibility to ensure that these relationships are correct and comply with the different bus protocols. 
For simplicity the pseudocode is split up into numbered sections so thdX the functionality may be seen 
more easily. 
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PSO Description: This first segment of code defines a number of constants and variables that are used 
elsewhere in this description. Most signals have been defined in the I/O descriptions of the NfMU sub- 
blocks that precede this section of the document. The post_reset^tate variable is used later (in section 
PS4) to determine if we should translate the reset exception vector address or trap a null pointer access. 

PSO: 

const UnusedBottozn = 0x0 02 ACQ 00 
const ORAMTop = 0x0027FFFF 
const UserZ^atASpace » bOl 
const UserPrograittSpace » bOQ 
const SuperviaorDataSpace s bll 
const Supervisor ProgramSpace a blO 

const timeout.llznit = 0x40 // Meed to confirm that this is a suitable value 
const ResetBxceptionCycles = 0x8 

Cpu.adr_peri_Ms)ced(7:0] = cpu_mmu_adr [ 19 : 12 ] 
cpu.adr.draiiu;naskedri6:0) <= cpujnRiu.adr & Ox003FFFEO 

if (prst_n == 0) then // Initialise everything 

cpu_adr « cpu_inmu^adr(21 : 0] 
peri_access_en = 0 
dranL-access.en » 0 
mnu.cpu.data = peri_inmu.data 
innu_cpu.rdy » 0 
xnsuucpu^berr = 0 
post.reset_atate = TRUE 
access_initiated = FALSE 
cpu_access_cnt « 0 

// The following is used to determine if we are coming out of reset for the purposes of 
// reset exception vector redirection. There may be a convenient signal in the CPU core 
// that we could use instead of this. 

if ( (cpu_start_access «= 1) AKD ( cpu_access.cn t < ResetExceptionCycles ) AND 
<clock_tick TRUE)) then 
cpu.access_cnt = cpu^access.cnt 4-1 
else 

post.reset_atate = FAL.SE 

PSl Description: This section is at the top of the hierarchy that determines the validity of an access. The 
address is tested to see which niacro-region (i.e. Unused, CPU Subsystem or DRAM) it falls into or 
whether the reset exception vector is being accessed. 

PSl: 

if (cpxc;inmjL.adr >= UnusedBottom) then 

// The access is to an invalid area of the address space. See section P62 

els if <(cpu.jcinu.adr > DRAMTop) AND (cpu_?nnm_adr < UnusedBottom)) then 

// We are in the CPU Subsystem/PBP Subsystem address space. See section PS3 

// Only remaining possibility is an access to DRAM address space 

// First we need to intercept the special case for the reset exception vector 

elsif (cpiuTRmi_adr < 0x00000010) then 

// The reset exception is being accessed. See section PS4 

elsif ( (cpu_adr_draiiL.masked >= RegionOBottom) AMD <cptL.adr_drain.jnasked <s 
RegionOTop) } then 
// We are in RegionO. See section P8S 

elsif ( <cpu_adr_dranunias)ced >= RegionNBottora) AND (cpu.adr.draziL^sked 
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RegionNTop) ) Chen //we are in RegionN 
// Repeat the RegionO (i.e. section PS5) logic for each of Regionl to Region? 

else //We could end up here if there were gaps in the DRAM regions 
peri_access_en - 0 
drain_access_en = 0 

nmu_cpu_berr = 1 //we have an unknown access error, moat likely due to hitting 
]iKnu_cpu.rdy = 0 //a gap in the DRAM regions 

// Only thing remaining is to implement a bus timeout function. This is done in PS6 

end 



PS2 Description: Accesses to the large unused area of the address space are trapped by this section. No 
bus transactions are initiated and the mmu^cpujberr signal is asseited. 

PS2: 

elsif <cpu.^nsnu_jadr >- UnusedBottom) then 

peri_acces8_en = 0 // The access is to an invalid area of the address space 
draiiL.access_en = 0 
iKBU_cpu.berr = 1 
mmu_cpu_rdy « 0 

PS3 Description: This section deals with accesses to CPU Subsystem peripherals, including the MMU 
itself. If tiie MMU registers are being accessed then no external bus transactions are required. Access to 
the MMU registers is only permitted of the CPU is making a data access from supervisor mode, othenvise 
a bus error is asserted and the access terminated. For non*MMU accesses then transactions occur over the 
CPU Subsystem Bus and each peripheral is responsible for dctem:iining whether or not the CPU is in the 
correct mode (based on the cpu^acode signals) to be permitted access to its registers. Note that all of the 
PEP registers are accessed via the PCU which is on the CPU Subsystem Bus. 



PS3: 



elsif ( (cpu_xtimu_adr > DRAMTop) AND (cpu_mmu_adr < UnusedBottom) ) 
//We are in the CPU Subsystem/ PEP Subsystem address space 



then 



// selects the addressed register 



cpu_adr = cpu_nirau^adr [ 2 1 : 0 1 

if Ccpu_adr_peri_ma3ked »= KKU^base) then // access is to local registers 
peri_acces3_en » 0 
dram_access_en = 0 

if (cpu_acode == SupervisorDataSpace) then 
for (i=0; i<26; i^-^) { 

if ((i ss cpu_nimj_adr ( & : 2 ) ) then 
if (cpu_rwn «= 1) then 

mmu^cpu.data [16:0) = MHUReg[i] 
ninu_cpu„rdy = 1 
nsnu.cpu.berr » 0 
else // vrrite cycle 

MKURegCi] = cpu_dataout [16: 0} 
nmta_cpu_rdy » 1 
nanu^cpu^berr « 0 
else // there is no register mapped to this address 

xnniu_cpuL.berr « 1 // do we really want a bus_error here as registers 
«mu_cpu.rdy = 0 // are just mirrored in other blocks 



// MMUReg[i] 
// registers 



one of the 
Table 18 



else //we have an access violation 
mrou^cpu^berr » 1 
nmiu_cpu«.rdy « 0 



else // access is to something else on the CPU Stibsystem Bus 
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pGri_access_en = 1 
draii\_access_en = 0 
nimu_cpu_data = perijnmu^data 
nttnu_cpii_rdy = peri^nrou^rdy 
irrou_cpu_berr = pert_jnmtL-berr 

PS4 Description: The only correct accesses to the locations beneath 0x00000010 are fetches of the reset 
tr^ handling routine and these should be the first accesses after reset Here we trap all other accesses to 
these locations regardless of the CPU mode. This most likely cause of such an access will be the use of a 
null pointer in the program executing on the CPU, 

PS4: 

elsif (cpujnmu_adr < OxOOOOOOlO) then //may need to translate a wider range - depends 
if (post_re3et_state TRUE)) then // on how LEON handles the reset exception. 
cp\JL.adr[21:0] = {R0hUbase(21 : 3), cpujnmu_adr ( 2 : 0 ) ) 
peri_acces8_en = 1 
draiiv_access_en = 0 
xnmtL_cpu_data » peri_;nniu_data 
xnnu_cpu_rdy = peri_rninu_rdy 
mmu^cpu^berr = peri_inmu^berr 
else //we have a problem (almost certainly a null pointer) 
peri_acces8_en = 0 
dran\_access_en = 0 
TOEDM^cpuJberTr « 1 
J''niu_cpu_rdy = 0 

PS5 Description: This large section of pseudocode simply checks whether the access is within the bounds 
of DRAM RegionO and if so whether or not the access is of a type permitted by the RegionOContrvl regis- 
ter. If the access is permitted then a DRAM access is initiated for all data accesses and for instruction 
fetches that result in a cache miss. All instruction fetches are returned via the ICache interface regardless 
of whether they come from a cache hit or refill from DRAM. If the access is not of a type permitted by the 
RegionOControl register then the access is terminated with a bus error. 

PS5: 

elsif ( (cpu_adr_draiiK_naa}ced >= RegionOBottom) AND (cpu_adr_draiiujnasJced <= 
RegionOTop) ) then // we are in RegionO 

//Wo need to check that the DRAM access does not cross a 2 56 -bit boundary 
// Only 16 or 32-bit CPU accesses are capable of traversing a 256-bit boundary 

it ( ( <cpujnmu_adr(4:0) OxlF) AND ( (cpu_ben == bOl) OR (cpu^ben biO) ) ) 
OR ( ( cpu.,;arau_adr ( 4 : 0 ] == OxlE) AND (cpu^ben blO) ) 
OR ( ( cpu.;nmu_adr 1 4 : 0 ) == OxlD) AND (cpu^ben blO) ) ) then 

peri_access_en = 0 

drani_access_en =2 0 

inmu_cpu_berr = 1 

ttimu^cpu_rdy c 0 

else // access does not cross 256-bit boundary so we can proceed 
cpu.adr = cpu_mmu_adr (21 : OJ 
if (cpu_rvm 1) then 

if ( (cpu_acode == SupervisorProgramSpace AND RegionOControl (2 ) 1)) 
OR (cpu_acode =3 UserProgramSpace AND RegionOControl (5] 1)) then 

// this is a valid instruction fetch from RegionO 
peri_access_on = 0 
draii\_access_en = 1 
nanu_cpu_data » ic_data 
nimu_cpu.rdy a ic_rdy 
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inimi_cpu_berr = 0 

elsif ( {cpu_«code == SupervisorDataSpace AND RegionOControl (0] == 1) 
OR (cpu_acode =s UserDataSpace AND RegionOControl [3 } 1)) then 

// this is a valid read access from RegionO 

peri^access^en = 0 
drain_access_en = 1 

inmu^cpu.data = drairudata // possibly drc„data if dcache is used 
iranu_cpu_rdy s dr«un_rdy // possibly drc_rdy 
inmuL.cpuJberr = 0 

else //we have an access violation 

peri_access_en = 0 
draiiuaccess_en = 0 
nimu_cpu_berr = 1 
inrau_cpu_rdy - 0 

else // it is a vnrite access 

if ( (cpu_acode == SupervisorDataSpace AND RegionOControl [1] == 1) 

OR (cpu_acode UserDataSpace AND RegionOControl [4] == 1)) then 

// this is a valid vnrite access to RegionO 

peri_acces8_en = 0 
draun_access_en = 1 

nttnu_cpu_rdy = dram_rdy // possibly dwc_rdy if dcache is used 
inmu_cpu_berr = 0 
else //we have an access violation 

peri_access_en = 0 
drajtt.access_en = 0 
zninu_cpu_berr = 1 
tnmu_cpu_rdy = 0 



FS6 DescriptiOD: This final section of pseudocode deals with the special case of a bus timeout. This 
occurs when an access has been initiated but has not completed before the rimeoutjimit number of pclk 
cycles. While access to both DRAM and CPU/PEP Subsystem registers will take a variable number of 
cycles (due to DRAM traffic, PCU command execution or the different timing required to access registers 
in imported IP) each access should complete before the timeout^limit occurs. Therefore it should not be 
possible to stall the CPU by locking either the CPU Subsystem or DIU buses. However given the fatal 
effect such a stall would have it is considered prudent to implement bus timeout detection. 

PS6: 

// Only thing remaining is to implement a bus timeout function. 

if ( (cpu__start_occes3 e= 1) then 
access_initiated = TRUE 
timeout_countdown a BusTimeout 

if ( <xianu_cpu_rdy =*= 1 ) OR (nnmi_cpu_berr ««1 ) ) then 
access_initiated = FALSE 
peri_access_en « 0 
draxiL_access_en 0 

if <(clock_tick == TRUE) AND ( access.ini tiated == TRUE)) 
if (tiineout_countdown > 0) then 

timeout^countdown- - 
else // timeout has occurred 

peri^access.en s o // abort the access 

dram^access^en « 0 

ninu_cpu_berr . = 1 

nw>u_cpu_rdy = 0 
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11.6.6.5 iCache 

The ICache sub-block implementation is described in section 1 1.7.1.1. 



11.7 Cache 

The decision on what type of caching solution to use on SoPEC is still open for the moment. There are 
two probable solutions: a) use the LEON caches with a minimal configuration (1 KB I and D caches) and 
b) use separate, simple one line 256-bit caches for instnicHon, data read and data write accesses. From a 
performance and (most likely) implementation point of view the LEON caches are the best solution how- 
ever they are much bigger than the one line caches (approx 6x). The one line caches do not offer the same 
degree of performance improvement as the LEON caches and are likely to add an extra cycle to all mem- 
ory accesses. The performance penalty for a LEON cache miss (i.e. for all memory accesses if we arc not 
using the LEON caches) and the the best and worst case access times from DRAM have yet to be fully 
determined The final decision on which caching solution to use will be made when all such infonnation is 
available. 

Therefore the section on caches^ which was present in previous versions of this document but is now 
mostly out of date, has been removed (the ICache is still relevant if one line caches are used and so is 
retained). 

11.7.1 Instruction Cache 

A caching mechanism would offer the advantage of greater aggregate performance while still guaranteeing 
a minimum level of performance. While greater perfonnance may not be required at present for this appli- 
cation the caching mechanism offers greater efficiency (i.e. MIPS/MHz) and so the CPU clock could be 
reduced without affecting, or only negligibly affecting, the operating perfonnance. The advantage here is 
diat the design is scalable - better performance can be achieved by simply increasing the clock rate. 

As all reads from the embedded DRAM on SoPEC produce words that are 256 bits wide it is inefficient to 
hook this up to a 32-bit CPU bus as 224 bits of each read would be discarded. If the lull 256-bit word is 
stored locally to the CPU as a single-line cache then a ??x performance improvement could be obtained in 
the typical case (this is of course highly code dependent). This single line cache would be very easy to 
implement as it would just involve the address to be compared to a single tag and no replacement algo- 
rithm would be required. Furthermore the area impact would be minor and there should be no performance 
penalty for cache misses. As the dram_cpu_data bus is 256 bits wide the requested word is immediately 
available to the CPU i.e. we do not need to perform critical word first reordering of the data. 

The instruction cache is only accessed for instruction fetches, not all CPU reads. These can be differenti- 
ated by signals emanating from the CPU. Non-instruction CPU reads would be supported by the data 
cache. In the case of a cache miss the read request is processed by the MMU to ensure the request is valid 
before a read request is generated on the relevant external (to the CPU block) bus. The MMU should be 
informed of a cache hit to ensure it does not generate an unneccessary read request This requires that the 
regions used to store code are aligned on 32-byte (256-bit) boundaries. 

As there is no requirement to have more time detenninistic code execution the instruction cache cannot be 
disabled. 

11.7.1.1 iCache implementation 

The Instruction Cache used in SoPEC is capable of storing just a single 256-bit DRAM word. An in^>le- 
xnentation is depicted in Figure 22 below. The block UOs are given in Table 24 and these should be viewed 
in conjunction with Figure 19 and Figure 20 for a complete depiction of the connectivity of the block. 
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cache.mtss 
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Figure 22. ICache Block Diagram 



I 



Table 24. ICache l/Os 











Global SoPEC signals 


prst_n 


1 


In 


Global reset. Synchronous to pctfr, active tow. 


pdk 


1 


In 


GlobaJ dock 


Toplevel ICache signals 


dram_cpu.data(255:0] 


256 


In 


Data bus from the 01 U 


cpu_acode{1 :0] 


2 


In 


CPU access control signals 


cpu_adr(21:21 


20 


In 


CPU core address bus. 


ICache to DtU Bus Interface signals 


te_cache_hit 


1 


Out 


Cache hit signal. This indicates that the current CPU read request 
Is being serviced by the ICache and so should not be retrieved from 
the DRAM. 


dram_rdy 


1 


In 


Data Ready signal. Indicates the data on the dram^cpu^data bus is 
valid. 


ICache to MMU Control Block signals 


ic.da1a(31:0} 


32 


Out 


1 ICache data bus 
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Table 24. ICacheVOs 




Ic.rdy 



1 Out 



Ready signal from the ICache indicating the data on ic^data is valid 



dram.access.en 



1 



Out 



DRAM access enat>ie aignal. Indicates that the cun^nt CPU access 
Is a valid ORAM access. 



Description: 



The Tag stores the DRAM word address of tfie word currently in cache. The Tag contents are compared 
with cpu_adr[21:5J each time the CPU requests an instruction fetch from a valid DRAM address (indi- 
cated by cpujacodefO] and drcun_access_en). If a match occurs (i.e. a cache hit) the access is serviced by 
returning the correct 32 bits (as selected by cpu_adr[4:2]) to the MMU Control Block. If a match does not 
occur (i.e. a cache miss) the ic^cachejiit line is held low indicating to the DIU Bus Interface that a 
DRAM access should commence. Completion of the DRAM access is signalled by the assertion of 
dromjrdy and this causes the ICache contents to be x^dated, the Tag value replaced and the relevant 32 
bits forwarded to the CPU accompanied by the assertion of the icjrdy signal. It is updated each time the 
cache line is refilled from DRAM. All instruction fetches from DRAM are cacheable, regardless of which 
DRAM region is being accessed (although the access permissions still need to match those programmed 
for the region) and whether the CPU is in user or supervisor mode. 



The RDU facilitates the observation of the contents of most of the CPU addressable registers in the SoPEC 
device in addition to some pseudo-registers in realtime. The contents of pseudo-registers, i.e. registers that 
are collections of otherwise unobservable signals and that do not affect the functionality of a circuit, axe 
defined in each block as required. Many blocks do not have pseudo-registers and some blocks (e.g. ROM , 
PSS) do not make debug information available to the RDU as it would be of little value in realtime debug. 

Each block that supports realtime debug observation features a DebugSelect register that controls a local 
mux to determine which register is output on the block's data bus (i.e. hlockjopu^datd). One small draw- 
back with reusing the blocks data bus is that the debug data cannot be present on the same bus during a 
CPU read from the block. An accompanying active high block_cpujdebug_valid signal is used to indicate 
when the data bus contains valid debug data and when the bus is being used by the CPU. There is no arbi- 
tration for the bus as the CPU ivill always have access when required. A block diagram of the RDU is 
shown in Figure 23. 



11.7.2 



Data Cache 



11.8 



Realtime Debug Unit (RDU) 
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debug^data.valid 



dBbuQ_data_out[0] 




debug.data.out(1 7] 



-7^ 



IL 



Debug 
Data17Src 
Register 

> 





1 


1 

1 


4 


1 

f 





DebugSrc 
Register 



cpr_cpu_debug_valid 
diu_cpu_debug_valid 
gpto.cpu_debug_vatid 
icujcpu.debug.valid 
> lss_cpu_debug_vafld 
• pcu.cpu_debug_valid 

■ scb_cpu_debuo_valid 

■ t{m_cpu_debu9_.valid 

iTVT!u_debug_valid 



• cpr_cpu_datal31 K)) 

• diu_cpu.debugjdata(31 :0] 

• ppto_cpu_data[31:0] 

- icu_cpu_datar31:01 

- Iss__cpu_data(31:0j 

- pcu_cpu_data[31:0] 

- scb_cpu_data[31 :0] 
■ tjm_cpu_data(31:0] 

- mmu.debug_data[31:0] 



dBbug_cn t ir j[18:01 




Figure 23. Realtime Debug Unit block diagram 



Table 25. RDU l/Os 





1^ 






diu^cpu.data 


32 


In 


Read data bus from the DIU block 


cpf_cpu_data 


32 


In 


Read data bus from the CPR block 


9pio_cpu_data 


32 


In 


Read data bus from the GPIO block 


lcu_cpu_data 


32 


In 


Read data txis froin the ICU block 


1ss_cpu_data 


32 


In 


Read data bus from the LSS block 


pcu_cpu_debug_data 


32 


In 


Read data txjs from the PCU tAock 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 96 




SoPEC : Hardware Design 



Table 25. RDU l/Os 



scb_cpu_data 


32 


In 


Read data bus from the SCB btock 


tim_cpii_data 


32 


In 


Read data bus from the TIM btock 


diu_cpu.debug_valid 




m 


Signal indicating the data on tlie tliujcpu^data bus Is valid debug 
data. 


tjm.cpu_debug.vaiid 


— — 


In 


Signal indicating the data on the tim_cpu_data bus is valid debug 
data. 


scb_cpu_debug.vafid 




In 


Signal indicating the data on the scb^cpu^data bus is valid debug 
data. 


pcu_cpu_debug_varid 




In 


Signal indicating the data on the pcu^cpu^dBta bus is valid det>ug 
data. 


lss_cpu_debus.valld 




In 


Signal indicating the data on tiie fss^cpu^(gata bus Is valid debug 
data. 


icu_cpu_debug_valld 




In 


Signal indicating the data on the icu_cpu_data bus is valid debug 
data. 


0pio.cpu_debug_valid 




In 


Signal indicating ttie data on tiie gplo^cpujiiata bus is valid debug 
data. 


cpr_cpu_debuo_valld 




in 


Signal indicating ti^e data on ttie cpr_cpu_data bus is valki debug 
data. 


debug.data^out 


18 


CXit 


Output debug data to be muxed on to the PHI/GPlO/other pins 


debug_data_valid 




Out 


Debug valid signal indicating the validity of the data on 
debuQ_data_out, This signal is used in all debug configurations 


debug^cntrl 


19 


Out 


Control signal for each PHI t>ound debug data line indicating 
whether or not the debug data should be selected by the pin mux 



As there are no spare pins that can be used to output the debug data to an external capture device some of 
the existing I/Os will have a debug multiplexer placed in front of them to allow them be used as debug 
pins. Unfortunately many of the pins on SoPEC cannot even be multiplexed in this fashion so it will not be 
possible to output a full 32-bit debug data word every cycle. The exact number of pins available for multi- 
plexing had yet to be finalised at the time of writing. This specification assimies 20 pins will be available 
but this can easily be revised up or, more likely, down. Furthermore not every pin Oiat has a debug mux 
will always be available to carry the debug data as they may be engaged in their primary purpose e.g. as a 
GPIO pin. The EIDU therefore outputs a debug_cntrl signal with each debug data bit to indicate whether 
the mux associated with each debug pin should select the debug data or the normal data for the pin.The 
DebugPinSel is used to detennine which of the 20? potential debug pins are enabled for debug at any par- 
ticular time. 

As it is not possible to output a full 32-bit debug word every cycle the RDU supports the outputting of an 
n-bit sub-word every cycle to the enabled debug pins. Each debug test would then need to be re-run a num- 
ber of times with a different portion of the debug word being output on the n-bit sub- word each time. The 
data from each nm should then be correlated to create a fiill 32-bit (or whatever size is needed) debug 
word for every cycle. The debug^data^valid and pclk_put signals will accompany every sub-word to allow 
the data to be sampled correctly. The pclk^out signal is sourced close to its output pad rather than in the 
RDU to minimise the skew between the rising edge of the debug data signals (which should be registered 
close to their output pads) and the rising edge of pclk^out. 

As multiple debug runs will be needed to obtain a complete set of debug data the n-bit sub-word will need 
to contain a different bit pattern for each run. For maximum flexibility each debug pin has an associated 
DebugDataSrc register that allows any of the 32 bits of the debug data word to be output on that particular 
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debug data pin. The debug data pin must be enabled for debug operation by having its corresponding bit in 
the DebugPinSel register set for the selected debug data bit to appear on the pin. 

The size of the sub-word is determined by the number of enabled debug pins which is controlled by the 
DebugPinSel register. Note that the debugjdataj^alid signal is always output. Furthermore 
debugjsntrl[0] (which is configured by DebugPinSel [0]) controls the mux for both the debugjdata_valid 
and pclk_out signals as both of these must be enabled for any debug operation. 

The mapping of debug_data_out[n] signals ont:o individual pins will take place outside the RDU. When 
the exact mapping has been finalised it will be recorded here. A proposed mapping is shown in Table 26 
below. 



Table 26. Example DebugPinSel mapping 





mMmsmMMMim 


0 


phUfrdk. The debug^data^v^id ^\^n3\ will 
appear on this pin when enabled. Enabling this 
pin also automaticany enables the phi_readl pin 
which will output the pc/^out signal 


1 


phi^profiJo 


2 


phljsynci 


3 


test pin 1 


4 


test pin2 


5-18 


gpio[0...13] 



Table 27. RDU Configuration Registers 



^^^^ 




ml 






0x80 


DebugSrc 


4 


0x00 


Denotes which block is supplying the debug 
data. The encocfing of this block is given bek>w. 
0-MMU 
1 - TIM 

2- LSS 

3- GPIO 

4- SCB 

5- ICU 

6- CPR 

7- DIU 
8 • PCU 


0x84 


DebugPinSel 


19 


0x0^0000 


Oetemnines whether a pin is used for debug data 
output. A provisional nnapping of pin to bit posi- 
tion is given in Table 26. 
1 - Pin outputs debug data 
0 • Norma] pin function 


OxSdtoOxCC 


DebugDataSrcN 


5 


0x00 


Selects which t>it of the 32-bit debug data word 
will be outputted on debug.data_outlN] 



1 1 .9 Interrupt Operation 

The interrupt controller unit (see chapter 14) generates an interrupt request by driving interrupt request 
lines with the appropriate interrupt level. LEON supports 15 levels of interrupt with level 15 as the highest 
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level (the SPARC architecture manual (32] states that level 15 is non-maskable but we have the freedom to 
mask this if desired). The CPU will begin processing an interrupt exception when execution of the current 
instruction has completed and it will only do so if the interrupt level is higher than the current processor 
priority. If a second intemipt request arrives with the same level as an executing intemipt service routine 
then the exception will not be processed until the executing routine has completed. 

When an interrupt trap occurs the LEON hardware will place the program counters (PC and nPC) into two 
local registers. The interrupt handler routine is expected, as a minimum, to place the PSR register in 
another local register to ensure that the LEON can correctly return to its pre-interrupt state. The 4-bit inter- 
rupt level (/rO is also written to the trap type (tt) field of the TBR (Trap Base Register) by hardware. The 
TBR then contains the vector of the trap handler routine the processor will then jump. The TBA (Trap 
Base Address) field of the TBR must have a valid value before any interrupt processing can occur so it 
should be configured at an early stage. 

Interrupt pre-emption is supported while ET (Enable Traps) bit of the PSR is set This bit is cleared during 
the initial trap processing. In initial simulations the ET bit was observed to be cleared for up to 30 cycles. 
This causes significant additional interrupt latency in the worst case where a higher priority interrupt 
arrives just as a lower priority one is taken. 

The internet acknowledge cycles shown in Figure 24 below are derived from simulations of the LEON 
processor and accompanying interrupt controller. This interrupt controller will be replaced by the ICU in 
the SoPEC design. The LEON signal names are used for future reference. An interrupt is asserted by driv- 
ing its (encoded) level on the iui.irl[3:0] signals. The LEON core responds to this, with variable timing, by 
reflecting the level of the taken interrupt on the iuoArl[3:0] signals and asserting the acknowledge signal 
iuoAntack.Th& interrupt controller then removes the interrupt level one cycle after it has seen the level been 
acknowledged by the core. If there is another pending interrupt (of lower priority) then this should be 
driven on iuUr!f3:0J and the CPU will take that intemipt (the level 9 interrupt in the example below) once 
it has finished processing the higher priority interrupt. The iuoJrlf3:0J signals always reflect the level of 
the last taken interrupt, even when the CPU has finished processing all interrupts. 



pclk 



iuLirl[3:0] 0x0 



0x5 



0x0 



luo.lrl[3:0] RsNS^^SS^^^^^ 0x5 



iuo.lntack | 



lui.trlI3:0] I 0x9 | 0x8 | 

iuo.lrl[3;0] | OxA | 0x9 



iuo.intack | |^ 



Figure 24. Interrupt acknowledge cycles for a single and pending interrupts 
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11.10 Boot Operation 

See section 17.2 for a description of the SoPEC boot operation. 

11.11 Software Debug 

Software debug mechanisms arc discussed in the "SoPEC Software Debug" document [15], 
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12 Serial Communications Block (SCB) 

12.1 Overview 

The Serial Communications Block (SCB) handles the movement of all data between the SoPEC and the 
host device (i.e. PC) and between master and slave SoPEC devices. The SCB consists of a USB 1.1 device 
controller* an Inter-SoPEC Interface (ISI) and a DMA manager. A block diagram of the SCB is shown in 
Figure 25 below. The major blocks of the SCB, namely the ISI, USB and DMA manager, could be imple- 
mented as separate blocks but are integrated to take advantage of the performance gains and design simpli- 
fications that a tighter coupling allow. 



USB control 
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Controller 
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pclk 



Figure 25. Serial Communications Block 

The USB Controller will be an imported piece of IP. There are many possible sources of this block but it is 
likely that it will be supplied by the silicon vendor - all three current silicon vendor candidates will supply 
USB 1 .1 controllers^ although some of these have been sourced from a third party. 

The SCB can be seenjn the context of the overall SoPEC device iri Figure 26 below 
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Figure 26. SoPEC toptevet blocl< diagram 
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12.2 Definitions of I/Os 

Table 28. Serial Communications Block I/O 









Clocks and Resets 


pr5t_n 


1 


In 


System reset signal. Active low. 


pdk 


1 


In 


System dock. 


usb_dk 


1 


In 


Clock for the USB controHer Mock. 


IsLcpr^reset^n 


1 


Out 


Signal from the ISI Indicating that ISl activity has been detected 
white In sleep mode and so the chip should be reset Active low. 


usb_cpr_reset_n 


1 


Out 


Signal from the USB controller that a USB reset has occurred. 
Active low. 


CPU Interface 


cpu.adr(n:2] 


n-1 


In 


CPU address bus. Exact width is currently TBD as it is dependent 
on the address maps of Imported IP ' 


cpu_dataout(d1:0] 


32 


In 


Shared write data bus from the CPU 


scb_cpu_data{ 3 1 :0] 


32 


Out 


Read data bus to the CPU 


cpu_rwn 




In 


Common read/not-write signal from the CPU 


cpujc[2:0] 




m 


CPU Function Code signals. 


cpu_scb_sel 




In 


Block select from the CPU. When cpuLScO.se/ is high both cpu_adr 
and cpULdaraouf are valid 


scb_cpu.rdy 




Out 


Ready signal to the CPU. When scb.cpu^rdyls high it indicates the 
last cycle of the access. For a write cyde this means cpu_dataout 
has been registered by the SCB and for a read cyde this means the 
data on scb^cpu_data Is valid. 


scb_cpu_berr 




Out 


Bus error signal to the CPU indicating an invalid access. 


scb_cpu_debug_varjd 




Out 


Signal indicating that the data cun-ently on scb_cpu_data is valid 
debug data 


interrupt signals 


dmajcujrq 




Out 


DMA Interrupt signal to the interrupt controller block. 


isijcujrq 




Out 


ISI interrupt signal to the interrupt controller block. 


usbjcujrq 




Out 


USB interrupt signal to the interrupt controller btock. 


DIU interface 


scb_dKj_wadr{21 :5] 


17 


Out 


Write address bus to the DIU 


scb_diu.data[63K)] 


64 


Out 


Data bus to the DIU. 


scb_diu_wreq 




Out 


Write request to the DiU 


dlu_scb_wack 




In 


Acknowledge from the DIU that the write request was accepted. 


scb_diu_wvalid 




Out 


Signal from the SCB to the DIU indicating that the data currently on 
the scb_diu_data[63:0] bus is valid 


GPIO Interface 


isLgplo_doiJt(1:0) 


2 


Out 


ISI output data to GPIO pins 


isi_flpio_e(1 :0) 


2 


Out 


ISI output enable to GPIO pins 


Qpio JsLdin(1 :0] 


2 


In 


Input data from GPIO pins to ISt 
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I 12.3 MULTI-SOPEC SYSTEMS 

While single SoPEC systems are expected to form the majority of SoPEC systems the SoPEC device must 
also support its use in multi-SoPEC systems such as that shown in Figure 27 below. A SoPEC may be 
assigned any one of a number of identities in a multi-SoPEC system. A SoPEC may be one or more of a 
PrintMaster, a LineSyncMaster. an ISIMaster, a StorageSoPEC or an ISISlave SoPEC 



r-— — — — — 'I 
I replacoabla i 
I Ink cwtridga i 



I 




printhead assembly _ 

flaurezirAZ duplex system featuring four printing SoPECs with a single 

SoPEC DRAM device 



12.3.1 ISIMaster device 

The ISIMaster is the only device allowed to drive the common ISI line (see Figure 28) and interfaces 
directly with the host. In most systems the ISIMaster will simply be the SoPEC connected to the USB bus. 
Future systems, however^ may employ an ISI-Bridge chip to interface between the host and the ISI bus and 
in such systems the ISI-Bridge chip will be the ISIMaster. There can only be one ISIMaster on an ISI bus. 

12.3.2 PrintMaster device 

The PrintMaster device is responsible for co-ordinating all aspects of the print operation. This includes 
starting the print operation in al! printing SoPECs and conmiunicating status back to the host. When the 
ISIMaster is a SoPEC device it is also likely to be the PrintMaster as well. There may only be one Print- 
Master in a system and it is most likely to be a SoPEC device. 
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12.3.3 LineSyncMaster device 



The LineSyncMaster device generates the Isync pulse that all SoPECs in the system must synchronize 
their line outputs with. Any SoPEC in the system could act as a LineSyncMaster although the PrintMaster 
is probably the most likely candidate. It is possible that the LineSyncMaster may not be a SoPEC device at 
all - it could, for example, come from some OEM motor control circuitry. There may only be one LineSyn- 
cMaster in a system. 



For certain printer types it may be realistic to use one SoPEC as a storage device without using its print 
engine capability - that is to effectively use it as an ISI-attached DRAM. A storage SoPEC would receive 
data from the ISIMaster (most likely to be an ISI-Bridge chip) and then distribute it to the other SoPECs as 
required. No other type of data flow (e.g. ISISlave -> storage SoPEC -> ISISlave) would need to be sup- 
ported in such a scenario. The SCB supports this functionality at no additional cost because the CPU han- 
dles the task of transferring outbound data from the embedded DRAM to the ISI transmit buffer. The CPU 
in a storage SoPEC will have almost nothing else to do. 



12.3.5 ISISlave device 

Multi-SoPEC systems will contain one or more ISISlave SoPECs. An ISISlave SoPEC is primarily used to 
generate dot data for the printhead IC it is driving. 

12.3.6 ISI-Bridge device 



SoPEC is targeted at the low-cost small office / home office (SoHo) market. It may also be used in future 
systems that target different market segments which are likely to have a high speed interface capability. A 
fiiture device, known as an ISI-Bridge chip, is envisaged which will feature both a high speed interface 
(such as USB2.0, Ethemet or IEEE 1394) and one or more ISI interfaces. The use of multiple ISI buses 
would allow the construction of independent print systems within the one printer. The ISI-Bridge would be 
the ISIMaster for each of the ISI buses it interfaces to. 



The host device will invariably be, but is not required to be, a PC. Any device that can act as a USB host or 
that can interface to an ISI-Bridge chip could be the host device. In particular, with the development of 
USB On-The-Go (USB OTG), it is possible that a number of USB OTG enabled products such as PDAs or 
digital cameras will be able to directly interface with a SoPEC printer. 



12.4 Types of communication 

12.4.1 Communications with host 

The host communicates directly with the ISIMaster in order to print pages. When the ISIMaster is a 
SoPEC. the commimications channel is USB 1.1. 

f 2.4. i. 1 Host to iSiMas ter communication 



The host will need to communicate the following information to the ISIMaster device: 

• Communications channel config\iration and maintenance information 

• All data destined for PrintMaster, ISISlave or storage SoPEC devices. This data is simply relayed by 
the ISIMaster 

• Mapping of virtual communications channels, such as USB endpoints, to ISI destination 



12.3.4 Storage device 



12.3.7 



Host device 
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12.4.1.2 ISIMaster to host communication 

The ISIMaster will need to communicate the following information to the host: 

• Communications channel configuration and maintenance information 

• All data originating from the PrintMaster, ISISlave or storage SoPEC devices and destined for the host. • 
This data is simply relayed by the ISIMaster 

1Z4. 1.3 Host to PrintMaster communication 

The host will need to communicate the following information to the PrintMaster device: 

• Program code for the PrintMaster 

• Compressed page data for the PrintMaster 

• Control messages to the PrintMaster 

• Tables and static data required for printing e.g. dead nozzle tables, dither matrices etc. 

• Authenticatable messages to upgrade the printer's capabilities 

12.4.1.4 PrintMaster to host communication 

The PrintMaster will need to communicate the following information to the host: 

• Printer status information (i.e. authenrication results, paper empty^anuncd etc.) 

• Dead nozzle information 

• Memory buffer status information 

• Power management status 

• Encrypted SoPEC Jd for use in the generation of PRINTER^QA keys during factory progranuning 

12.4.1.5 Host to ISiSiave communication 

All communication between the host and ISISlave SoPEC devices must take place via the ISIMaster. In 
the case of a SoPEC ISIMaster it is possible to configure each individual USB endpoint to act as a control 
channel to an ISISlave SoPEC if desired, although the endpoints will be more usually used to transport 
data. The host will need to conununicate the following information to ISISlave devices over the comms/ 
ISI: 

• Program code for ISISlave SoPEC devices 

• Compressed page data for ISISlave SoPEC devices 

• Control messages to the ISISlave SoPEC (where a control channel is supported) 

• Tables and static data required for printing e.g. dead nozzle tables, dither matrices etc. 

• Authenticatable messages to upgrade the printer's capabilities 

12.4.1.6 ISiSiave to host communication 

All communication between the ISISlave SoPEC devices and the host must take place via the ISIMaster. 
The ISISlave will need to communicate the following infonnation to the host over the comms/ISl: 

• Responses to the host's control messages (where a control channel is supported) 

• Dead nozzle infonnation from the ISISlave SoPEC. 

• Encrypted SoPEC Jd for use in the generation of PRINTER_QA keys during factory programming 
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12.4.2 



Communication over ISI 



1Z4,2. 1 iSiMaster to PrintMaster communication 



The ISiMaster and PrintMaster will often be the same physical device. When they arc different devices 
then the following information needs to be exchanged over the ISI: 

• All data from the host destined for the PrintMaster (see section 12.4.1.3). This data is simply relayed 
by the ISiMaster 



The ISiMaster and PrintMaster will often be the same physical device. When they are different devices 
then the following information needs to be exchanged over the ISI: 

• All data from the PrintMaster destined for the host (see section 12.4.1.4). This data is simply relayed 
by the ISiMaster 



The ISiMaster may wish to communicate the following information to the ISI Slaves: 

• All data (including program code such as ISIId enumeration) originating from the host and destined for 
the ISISlave (see section 12,4.1.5). This data is simply relayed by the ISiMaster 

• wake up from sleep mode 



The ISISlave may wish to communicate the following information to the ISiMaster: 
• All data originating from the ISISlave and destined for the host (see section 12.4.1.6). This data is sim 
ply relayed by the ISiMaster 



When the PrintMaster is not the ISiMaster all ISI communication is done in response to ISI ping packets 
(see 12.6.4.5). When the PrintMaster is the ISiMaster then it will of course communicate directly with 
the ISISlaves. The PrintMaster SoPEC may wish to conununicate the following information to the ISISla- 

ves: 

• Ink status e.g. requests for dotCount data i.e. the number of dots in each color fired by the printhcads 
connected to the ISISlaves 

• configuration of GPIO ports e.g. for clutch control and lid open detect 

• power down command telling the ISISlave to enter sleep mode 

• ink cartridge fail information 

This list is not complete and the time constraints associated with these requirements have yet to be deter- 
mined. 

In general the PrintMaster may need to be able to: 

• send messages to an ISISlave which will cause the ISISlave to return the contents of ISISlave registers 
to the PrintMaster or 

• to program ISISlave registers Math values sent by the PrintMaster 

This should be under the control of software nmning on the CPU which writes messages to the ISI/SCB 



PrintMaster to iSil^aster communication 



12.4.2.3 iSiMaster to iSiSiave communication 



12.4.2.4 ISiSiave to iSiMaster communication 



12.4.2.5 PrintMaster to iSISIave communication 



interface. 
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12.4.2.6 iSISIave to PrintMaster communication 



ISISlaves may need to communicate the following informatioD to the PrintMaster: 

• ink status e.g. dotCount data i.e. the number of dots in each color fired by the printheads connected to 
the ISISlaves 

• band related information e.g. finished band interrupts 

• page related information i.e.buffer underrun, page finished interrupts 

• MMU security violation interrupts 

• GPIO interrupts and status e.g. clutch control and lid open detect 

• printhead temperature 

• printhead dead nozzle information fi'om SoPEC printhead nozzle tests 

• power management status 

This list is not complete and the time constraints associated with these requirements have yet to be deter- 
mined. 

As the ISI is an insecure interfece commands issued over the ISI should be of limited capability e.g. only 
limited register writes allowed. The software protocol needs to be constructed with this in mind. In general 
ISISlaves may need to return register or status messages to the PrintMaster or ISIMaster. They may also 
need to indicate to the PrintMaster or ISIMaster that a particular interrupt has occurred on the I SI Slave. 
This should be under the control of software nmning on the CPU which writes messages to the ISI block. 



It is currently not anticipated that there will be any direct communication between ISISlave SoPECs. How- 
ever they can communicate indirectly via the ISIMaster SoPEC. The most likely scenario for such a com- 
mimication mechanism when the PrintMaster is not the ISIMaster (see sections 12.4.2.5 and 12.4.2.6 for a 
description of the information exchanged between a PrintMaster and an ISISlave). ISISlave to ISISlave 
communication would also be required when sending data stored in a storage SoPEC device to an 
ISISlave. 



The USB 1.1 interface for the printer should consist of the USB connector, the necessary discretes for USB 
signalling and the SoPEC device. A SoPEC printer will act as a self-powered, ftill-speed device and 
SoPEC itself will not draw any power from the USB cable. It will support control and bulk transfers. 
Interrupt transfers are not considered necessary because the required intermpt-type functionality can be 
achieved by sending query messages over the control chaimel on a scheduled basis. There is no require- 
ment to support either isochronous or low-speed transfers. The USB controller must support at least 5 
USB ehdpoints: a control endpoint (endpoint 0) and 4 bulk-data type endpoints. These 4 bulk-data type 
endpoints can be used for the transfer of any type of data: compressed page data, program data or control 
messages. They may also be mapped on to any target destination in a multi-SoPEC system i.e. configura- 
tion is completely programmable. They are envisaged as always being used as USB IN endpoints i.e. they 
will transport data from the host to SoPEC. Any feedback data (e.g. status information) will be returned to 
the host on the control channel (endpoint 0). 

The USB device enumeration process will be handled by the SoPEC CPU and USB controller. Note that 
this requires the on-chip ROM to contain all the required USB driver code. This is not expected to be the 
full USB driver but rather a "USB-lite" driver that has sufficient fiinctionality to download a program to 
DRAM. 

Details of the configuration registers and interface signals will be provided when the implementation IP 
for the USB controller core has been selected. There are several potential candidates for the USB 1.1 con- 



12.4.2.7 iSiSiave to iSfSiave communication 



12.5 



USB 
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troller that are being evaluated in terms of cost, maturity, licensing requirements/restrictions, quality of 
deliverables etc. - as already mentioned the choice of silicon vendor is likely to play a large part in select- 
ixig the USB controller. 

12.5.1 ISIMaster/lSISIave Identification 

While the USB controller is used for data transfer if a SoPEC is an ISIMaster it may, in certain cases, also 
be used to transfer data to an ISISlave. If the USB is not used for data transfer the device will certainly be 
an ISISlave. In this case the USB pins could be used to identify the device as an ISISlave as the USB 
device controller is expected to allow the single-ended quiescent state of the USB pins to be read by the 
CPU either directly or indirectly (as there should be a register indicating whether the USB controller is 
operating as a full-speed or low-speed device). We adopt the convention that an ISIMaster SoPEC has its 
USB pins configured for full-speed operation (i.e. a pull-up resistor on D+) and an ISISlave SoPEC has its 
USB pins configured for low-speed operation (i.e. a pull-up resistor on D-). This allows the ROM boot- 
code to quickly determine whether the SoPEC is an ISIMaster or ISISlave without needing to wait for 
USB activity. While the ISISlave SoPEC*s USB controller believes it is a low-speed device it is never used 
and may be disabled completely (if possible) once the device has been identified as an ISISlave. Note that 
other combinations on the D+ and D- lines may result in unreliable operation of the USB controller. 

The SoPECs identity as an ISIMaster or ISISlave may also be detemiined from USB or ISI activity. If 
activity is seen on USB endpoints 2-4 then the device is an ISIMaster (note that it is not riecccssarily an 
ISIMaster if activity is only seen on endpoints 0 or 1) and the ISI may automatically configure itself as an 
ISIMaster in this situation. If the ISI receives ping packets then it is an ISISlave as only the ISIMaster can 
send ping packets. 

The most suitable ISIMaster/lSISIave identification scheme (i.e. use of USB pins or looking for USB/ISI- 
activity) can be chosen by the software for any given printer. 

12.5^ Wake-up from sleep mode 

The SoPEC will be placed in sleep mode after a suspend command is received by the USB controller. The 
extent of power-down in sleep mode is currently TED (different silicon vendors offer different options) 
but it is expected to involve the loss of DRAM contents at a minimum. The USB controller (or portions of 
it) will continue to be powered and clocked in sleep mode. It is likely that a USB reset, as opposed to a 
device resume, will be required to bring SoPEC out of its sleep state as the sleep state is hoped to be logi- 
cally equivalent to the power down state. The exact reawakening mechanism will be finalised when the 
sleep state is more precisely defined and the particular implementation of the USB controller is chosen. 

The USB reset signal originating from the USB controller will be propagated to the CPR (as 
U3b_cpr_resetjn) if the USBWdkeupEnable bit of the WakeupEnable register (see Table 38) has been set. 
The USB WakeupEnable bit should therefore be set just prior to entering sleep mode. 

There are no conditions that require the SoPEC to initiate a USB device wake-up (i.e. where SoPEC sig- 
nals resume to the host after being suspended by the host). 

12.5.3 USB Speed 

The USB speed will be determined by amount of activity from other devices that share the USB bus with 
the printer and the responsiveness of the host in handling USB interrupts. To guarantee bandwidth to the 
printer it is reconrmiended that no other devices are active on the USB bus between the printer and the host. 
If the printer is connected to a USB2.0 host or hub it may limit the bandwidth available to other devices 
coimectcd to the same hub but it would not significantly affect the bandwidth available to other devices 
upstream of the hub. Used in the recommended configuration it is expected that an effective bandwidth of 
8-9 Mbit/s will be achieved. 
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12,6 ISI (Inter SoPEC Interface) 

The ISI is utilised in all system configurations requiring more than one SoPEC. An example of such a sys- 
tem which requires four SoPECs for duplex A3 printing and an additional SoPEC used as a storage device 
. is shown in Figure 27. 

The ISI performs much the same function between an ISISlave SoPEC and the ISIMaster as the USB con- 
nection performs between the ISIMaster and the host. This includes the transfer of all program data, com- 
pressed page data and message (i.e. commands or status information) passing between the ISIMaster and 
the ISISlave SoPECs. Existing requirements indicate that it is sufficient for the ISIMaster to initiate all 
communication with the ISISlaves. 

12.6.1 ISIMaster/ISISIave identification and ISISlave enumeration 

Section 12.5.1 details how a SoPEC is configured as an ISIMaster or ISISlave. The ISlId is established by 
software downloaded over the ISI (in broadcast mode) which looks at the input levels on a number of 
GPIO pins to determine the fSIId For any given printer that uses a multi-SoPEC configuration it is 
expected that there will always be enough free GPIO pins on the ISISlaves to support this enumeration 
mechanism. 



12.6.2 Wake*up from sleep mode 

Either the PrintMaster SoPEC or the host may place any of the ISISlave SoPECs in sleep mode prior to 
going into sleep mode itself. The ISISlave device should then ensure that its ISIWakeup Enable bit of the 
WakeupEnable register (see Table 38) is set prior to entering sleep mode. In an ISISlave device the ISI 
block will continue to receive power and clock during sleep mode so that it may monitor the gpiojsi^din 
lines for activity. When ISI activity is detected during sleep mode and the ISIWakeupEnable bit is set the 
ISI asserts the isi_cpr_reset_n signal. This will bring the rest of the chip out of sleep mode by means of a 
wakeup reset. See chapter 16 for more details of reset propagation. 

12.6.3 iSI speed 

The ISI will need to nin at speed that will allow error free transmission on the PCB while minimising the 
buffering and hardware requirements on SoPEC. While an ISI speed of 10 Mbit/s is adequate to match the 
effective USBl.l bandwidth it would limit the system performance when a high-speed connection (e.g. 
USB2.0, lEEEl 394) is used to attach the printer to the PC. Although they would require the use of an extra 
ISI-Bridge chip such systems are envisaged for more expensive printers (compared to the low-cost basic 
SoPEC powered printers that are initially being targeted) in the future. 

An ISI line speed (i.e. the speed of each individual ISI wire) of 32 Mbit/s is therefore proposed as it will 
allow ISI data to be oversampled 5 times (at a pclk frequency of I6OMH2). The total bandwidth of the ISI 
will depend on the number of pins used to implement the interface. The current expectation is that two 
pins will be used, giving a peak raw bandwidth of 64 Mbit/s» and this is the scenario that is used in this 
document. However the ISI protocol will work equally well if four pins are used for transmission/recep- 
tion and this would give a peak raw bandwidth of 128 Mbit/s. The number of pins available for the ISI is 
currently under investigation as part of the package selection process. With either a two or four pin ISI 
solution a 32 Mbit/s line speed would allow the movement of data in to and out of a storage SoPEC (as 
described in 12.3.4 above), which is the most bandwidth hungry ISI use, in a timely fashion. 

The maximum effective bandwidth of a two wire ISI, after allowing for protocol overheads and bus turn- 
around times, is expected to be approx. 50 Mbit/s. 
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12.6.4 ISI protocol 

The ISI is a serial interface utilizing a two wire half-duplex configuration as shown in Figure 28 below. An 
ISIMaster must always be present and up to 14 ISISIaves may also be on the ISI bus. The ISI bus enables 
broadcasting of data, ISIMaster to ISISlave conununication, ISISlave to ISIMaster communication and 
ISISlave to ISISlave communication. Flow control, enx>r detection and retransmission of eirored packets is 
also supported. ISI transmission is asynchronous and a Start field is present in every transmitted packet to 
ensure synchronization for the duration of the packet. Bit-stuffing is required as it is expected that synchro- 
nization cannot be guaranteed for the length of the longest allowed packet^ Open Issue: This should be 
confirmed with the spec of the crystal used with SoPEC. We may wish to constrain the spec of xtalin and 
also xtalin for the ISI-Bridge chip to ensure the ISI cannot drift out of sync during packet reception. 
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Figure 28. ISI configuration with four SoPEC devices 

To maximize the effective ISI bandwidth while minimising pin requirements a two wire half-duplex inter- 
leaved transmission scheme is used. Figure 29 below shows how a 16-bit word is transmitted from an ISI- 
Master to an ISISlave. Data is interleaved on a bit-by-bit basis over the two ISI lines and this requires all 
ISI packets to be an even number of bits in length. This interleaving could easily be extended to four pins 
if required. 

All ISI transactions are initiated by the ISIMaster and every non-broadcast data packet needs to be 
acknowledged by the addressed recipient. An ISISlave may only transmit when it receives a ping packet 
(see section 12.6.4.5) addressed to it. To avoid bus contention all ISI devices must wait one bit-time (5 pclk 
cycles) after detecting the end of a packet before transmitting a packet (assuming they are required to 
transmit). All non-transmitting ISI devices must tristate their Tx drivers to avoid line contention. A pull-up 
resistor is therefore required on both ISI lines to reduce the possibility of false data detection. The ISI pro- 
tocol is defined to avoid devices driving out of order (e.g. when an ISISlave is no longer being addressed). 
As the ISI will use standard I/O pads there will be no physical collision detection mechanism. 



I. Current max packet size 290 bits = 145 bits per line (on a 2 wire ISI) « 725 I6OMH2 cycles. Thus the pclks in the two communicat- 
ing ISI devices should not drift by more than one cycle in 725 i.e. 1 379 ppm. Careful analysis of the crysul, PLL and oscillator specs 
and the sync detection circuit is needed here to ensure our solution is robust 
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Figure 29. Half-duplex Interleaved transmission from ISIMaster to ISISiave 

There arc three types of ISI packet: a long packet (used for data transmission), a ping packet (used by the 
ISIMaster to prompt ISISlaves for packets) and a short packet (used to acknowledge receipt of a packet). 
All ISI packets are delineated by a Start and Stop fields and transmission is atomic i.e. an ISI packet may 
not be split or halted once transmission has started. 

12. 6,4. 1 fSi transactions 

The different types of ISI transactions are outlined in Figure 30 below. As described later all NAKs are 
inferred and ACKs are not addressed to any particular ISI device. 
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Transaction I: Long packet to an addressed ISISiave 
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Transaction 2: Ping packet to an addressed ISISiave. ISISiave has nothing to send 
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ISIMaster 



iSISIavB A 



ISlSlave B 



Transaction 3: Ping packet to an addressed ISISIave. ISlSlave A responds with a long packet to 
ISISIaveB and ISISlaveB responds with an ACK or NAK. 



ISIMaster 



ISlSlave A 



ISlSlave B 




Transaction 4: Ping packet to an addressed ISlSlave. ISISIaveA responds with a long packet to 
the ISIMaster and the ISIMaster responds with an ACK or NAIC 



Figure 30. tSI transactions 



12,6.4,2 Start field description and bit stuffing 

The Start field serves two purposes: To allow the start of a packet be unambiguously identified and to 
allow the receiving device synchronise to die data stream. The symbol, or data value, used to identify a 
Start field must not legitimately occur in the ensuing packet. Bit stuffing is used to guarantee that the Start 
symbol will be unique in any valid (i.e. error free) packet. The Start symbol should therefore be suffi- 
ciently long to ensure that the bit stuffing overhead is low but should still be short enough to reduce its own 
contribution to the packet overhead. A Start bit length of 8 bits is therefore used as it is an effective com- 
promise between these two constraints. The Start field, like every byte in a packet, is transmitted with its 
rightmost (Isb) bit first. 

If the correct symbol value is used bit stuffing offers the further advantage of forcing transitions on the ISI 
. lines which will allow synchronizatioQ be maintained. Unfortunately a symbol value that is good for forc- 
ing transitions (e.g. 0x00) is not good for guaranteeing initial synchronization and vice versa i.e. a symbol 
such as OxAA would ensure initial synchronization but cannot prevent synchronizadon being lost if a long 
mn of zeroes or ones is subsequently transmitted. 

To resolve this conflict the Start symbol will be OxAA and three different types of bit stuffing are used. 
Whenever OxAA is encountered in the data stream a 0 is inserted before the msb resulting in the 9-bit 
value Oxl2A (i.e. blOlOlOlO -> blOOlOIOlO). To ensure transitions occur during a long run of zeroes a 1 
is inserted after 7 zeroes thus 0x00 becomes 0x080 (i.e. bOOOOOOOO -> bO 10000000). Likewise to ensure 
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transitions will occxir during a run of ones a 0 is inserted after 7 ones and so OxFF becomes 0xl7F (i.e. 
bllllllll '>bl01111111). The receiving ISI device must detect these special values and strip out the 
inserted ones and zeroes. 

Note that any violation of bit stuffing will result in the FrameError status bit being set and the incoming 
packet will be treated as an errored packet. Furthermore if the St<irt field is not received as OxAA the 
FrameError status bit is set and incoming data is discarded until a correct Start field is detected 

In a truly random data such a bit stuffing scheme could cause an overhead of approx. 0.15%. While the 
data transmitted over the ISI will not be truly random (0x00 and OxFF are likely to occur more often than 
they would in a random data set) the overhead should remain low and will never exceed 11.1% (i.e. 1 in 
every 9 bits). 

12.6.4.3 Stop fieid description 

A 2-bit Stop field (= bl 1) is used to ensure that both lines return to the high state before the next packet is 
transmitted. Two bits are required because the Stop field will be interleaved over both ISI lines (4 bits 
would be used in a 4 wire ISI). The Stop field is not subject to bit stuffing because bit stuffing could result 
in the final transmitted bit being a 0 on one of the ISI lines. 

12.6.4.4 tSi iong paclcet description 

The format of a long ISI packet is shown in Figure 31 below Data may only be transferred between ISI 
devices using a long packet as both the short and ping packets have no payload field. Except in the case of 
a broadcast packet, the receiving ISI device will always reply to a long packet with either an explicit ACK. 
(no error detected in received packet) or an inferred NAJC (an error was detected in the received packet). 




bO b4 



4 bits 



1 bit 




Start 
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Address 


Payload 


CRC 


Stop 
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J_ 



JL 
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8 bits 3 bits 5 bits 256 bits 16 bits 2 bits 

Figure 31. ISI long packet 
All long packets begin with the Start field as described earlier. The PksDesc field is described in Table 29. 
Table 29. PktDesc field description 



Packet type indicator: 
1 - Short packet 

0 - Non-short (i.e. k>ng/ptng) packet 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 114 




SoPEC : Hardware Design 



Table 29. PMOesc field description 







1 


Data paytoad present indicator 

1 • This packet contains payload Q.e, long packet) 

0 ' This packet has no payload 


2 


Sequence bit vatue. Onty valkJ for k>ng packets. See section 12.6.4.8 tor a 
descriptk>n of sequence bit operation 



Any ISI device in the system may transmit a long packet but only the ISIMaster may initiate an ISI trans- 
action using a long packet. An ISISIavc may only send a long packet in reply to a ping message from the 
ISIMaster. A long packet from an ISISlave may be addressed to any ISI device in the system although the 
ISIMaster (or the PrintMaster if it is a different device) will be the usual recipient. 

The Address field is straightforward and complies with the ISI naming convention described in section 
12.7. 

The payload field is exactly what is in the transmit buffer of the transmitting IS! device and gets copied 
into the receive buffer of the addressed ISI device(s).When present the payload field is always 256 bits. 

To ensure strong error detection a 16-bit CRC is appended This CRC is calculated over the entire packet 
(excluding the Start and Stop fields). The HDLC standard CRC- 16 (i.e. G(x) = x^^ + jc'-^ + +/) is to be 
used for this calculation, which is to be performed serially. 

iS! ping packet 

The ISI ping packet is used to allow ISISlaves transmit on the ISI bus. As can be seen from Figure 32 
below the ping packet is cab be viewed as a special case of the long packet. In other words it is a long 
packet without any payload, whose PktDesc field is always bOOO and whose ISISubId is always 1. The 
ISISubld is unused in ping packets because the ISIMaster is addressing the ISI device rather than one of 
the DMA channels in the device. The ISISlave may address any ISIId.ISISubld in response if it wishes. 
The ISISlave will respond to a ping packet with either an explicit ACK (if it has nothing to send), an 
inferred NAK (if it detected an error in the ping packet) or a long packet (containing the data it wishes to 
send). Note that inferred NAKs do not result in the retransmission of a ping packet. This is because the 
ping packet will be retransmitted on a predetermined schedule (see 12.6.4.10 for more details). 
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, Figure 32. iSI ping packet 

An ISISlave should never respond to a ping message to the broadcast ISIId as this must have beeii sent in 
error. An ISI ping packet will never be sent in response to any packet and may only originate from an ISI- 
Master. 
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12.6.4.6 iSi short packet description 

The ISI short packet is only 14 bits long, including the Start and Stop fields. A value of b 1001 is proposed 
for the ACK symbol. As a 16-bit CRC is inappropriate for such a short packet it is not used In fact there is 
only one valid value for a 14-bit short ACK packet as the Start, ACK and Stop symbols all have fixed val- 
ues. Short packets are only used for acknowledgements (i.e explicit ACKs). The format of a short ISI 
packet is shown in Figure 33 below. 



Start 


Ack 
Symbol 


Stop 


II II 


8 bits 


4 bits 


2 bits 



Figure 33. Short IS! packet 

12, 6.4. 7 Error detection and retransmission 

The 16-bit CRC will provide a high degree of error detection and the probability of transmission errors 
occurring is very low as the transmission channel (i.e, PCB traces) will have a low inherent bit error rate. 
The number of undetected errors should therefore be minute. A simple retransmission mechanism frees 
the CPU from getting involved in error recovery for most errors because the probability of a transmission 
error occurring more than once in succession is very, very low in normal circumstances. 

After each non-short ISI packet is transmitted the transmitting device will open a reply window. The size 
of the reply window will be 9 bit times (i.e. 14 bits transmitted on two wires plus 2 bit times to allow for 
bus turnarounds and timing differences) when a short packet is expected and 147 bit times (i.e. 290 bits 
transmitted on two wires plus 2 bit times to allow for bus turnarounds and timing differences) when a long 
packet is expected in reply. 

When a packet has been received without any errors the receiving ISI device must transmit its acknowl- 
edge packet (which may be either a long or short packet) before the reply window closes. When detected 
errors do occur the receiving ISI device will not send any response. The transmitting ISI device interprets 
this lack of response as a NAK. indicating that errors were detected in the transmitted packet or that the 
receiving device was unable to receive the packet for some reason. If a long packet was transmitted the 
transmitting ISI device will keep the transmitted packet in its transmit buffer for retrcuismission. If the 
transmitting device is the ISIMaster it will retransmit the packet inunediately while if the transxnitting 
device is an ISISlave it will retransmit the packet in response to the next ping it receives from the ISIMas- 
ter. 

The transmitting ISI device will continue retransmitting the packet when it receives a NAK until it either 
receives an ACK or the number of retransmission attempts equals the value of the NumRetries register. If 
the transmission was unsuccessftil then the transmitting device sets the TxError bit in its ISIStatus register. 
The receiving device also sets the RxError bit in its ISIStatus register whenever it detects NumRetries + 1 
errored packets in succession. The NumRetries registers in all ISI devices should therefore be set to the 
same value for consistent operation. Note that successful transmission or reception of ping packets do not 
affect retransmission operation. Open Issue: In the case of an ISI device receiving a packet in error from 
an ISISlave the NumRetries count will be reset if it subsequently receives an error free packet from any ISI 
device (which may not be the ISISlave that transmitted the errored packet). Thus the RxError operation is 
only effective for ISIMaster to ISISlave transactions as these are the only ones where retransmissions will 
be sequential. Either we live with this or we could implement a NumRetriesCoimt window which would 
allow all NAKs within a specified window to be counted. If NumRetries is exceeded within this window 
then we have a RxError otherwise we can reset the count. 
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Note that either a transmit or receive error will cause the ISI to stop transmitting or receiving respectively. 
CPU intervention will be required to resolve the source of the problem and to restart the ISI transmit or 
receive operation. Transmit or receive errors should be extremely rare and their occiirrence will most 
likely indicate a serious problem. 

Note that broadcast packets arc never acknowledged to avoid contention on the common ISI lines. If an 
ISISlave detects an error in a broadcast packet it must use the message passing mechanism described ear- 
lier to alert the ISIMaster to the error. 



To ensure that communication between transmitting and receiving ISI devices is correctly ordered a 
sequence bit is included in every long packet to keep both devices in step with each other. Sequence bits 
are not used for short or ping packets as they are not used for data transmission. In addition to the transmit- 
ted sequence bit all ISI devices keep two local sequence bits, one for each ISISubld. Furthermore each ISI 
device maintams a transmit sequence bit for each ISIId and ISISubId it is in coirmiunication with. For 
packets.sourced from the host (via USB) the transmit sequence bit is contained in the relevant USBEPnD- 
est register while for packets sourced from the CPU the transmit sequence bit is contained in the 
CPUlSITxBuffCntrl register. The sequence bits for received packets are stored in DMAOSeqBit and 
DMAlSeqBit registers. All ISI devices will initialise their sequence bits to 0 after reset. It is the responsi- 
bility of software to ensure that the sequence bits of the transmitting and receiving ISI devices are cor- 
rectly initialised each time a new source is selected for any ISIId. ISISubId channel. 

Sequence bits arc not used in all broadcast and ping packets. Each SoPEC may also ignore the sequence 
bit on either of its ISISubId charmels by setting the appropriate bit in the SequenceMask register. The 
sequence bit should be ignored for ISISubId channels that will carry data that can originate from more 
than one source and is self ordering e.g. control messages. 

A receiving ISI device will toggle its sequence bit addressed by the ISISubId only when the receiver is 
able to accept data and receives an error-free data packet addressed to it. The transmitting ISI device will 
toggle its sequence bit for that ISIId.ISISubId channel only when it receives a valid ACK handshake from 
the addressed ISI device. 

Figure 34 shows the transriiission of two long packets with the sequence bit in both the transmitting and 
receiving devices toggling from 0 to 1 and back to 0 again. The toggling operarion will continue in this 
marmer in every subsequent transmission until an error condition is encountered. 



12.6.4,8 Sequence bit operation 
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Figure 34. Successful transmission of two long packets with sequence bit toggling 
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When the receiving ISI device detects an error in the transmitted long packet or is unable to accept the 
packet (because of full buffers for example) it will not return any packet and it will not toggle its local 
sequence bit An example of this is depicted in Figure 35. The absence of any response prompts the trans- 
mitting device to retransmit the original (seq=0) packet. This time the packet is received without any errors 
(or buffer space may have been freed) so the receiving ISI device toggles its local sequence bit and 
responds with an ACK. The transmitting device then toggles its local sequence bit to a 1 upon correct 
receipt of the ACK. 



Transmitting Receiving 
ISI Device ISI Device 




Figure 35. Sequence bit operation with errored long packet 



However it is also possible for the ACK packet from the receiving ISI device to be corrupted and this sce- 
nario is shown in Figure 36. In this case the receiving device toggles its local sequence bit to 1 when then 
long packet is received without error and replies with an ACK to the transmitting device. The transmitting 
device detects an error in the ACK packet and so will not change its local sequence bit. It then retransmits 
the seq=0 long packet. When the receiving device finds that there is a mismatch between the transmitted 
sequence bit and the expected (local) sequence bit is discards the long packet and replies with an ACK. 
When the transmitting ISI device correctly receives the ACK it updates its local sequence bit to a 1, thus 
restoring synchronization. Note that when the SequenceMask bit for the addressed ISISubId is set then the 
retransmitted packet is not discarded and so a duplicate packet will be received. The data contained in the 
packet should be self-ordering and so the software handling these packets (most likely control messages) 
is expected to deal with this eventuality. 
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Figure 36. Sequence bit operation with ACK error 
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f 2.6.4.9 Flow control 

The ISI also supports flow control by treating it in exactly the same manner as an error in the received 
packet. Because the SCB enjoys greater guaranteed bandwidth to DRAM than both the ISI and USB can 
supply flow control should not be required during normal operation. Any blockage on a DMA channel will 
soon result in the NumRetries value being exceeded and transmission to that DMA channel being halted. 
Because flow control is treated in the same manner as an error in the received packet neither the transmit- 
ting nor the receiving ISI device will be able to differentiate the cause of a TxError or RxError, 

iZ€.4.10Auto^plng operation 

While the CPU of the ISIMaster could send a ping packet by writing the appropriate header to the 
CPUISITxBuffCntrl register it is expected that all ping packets will be generated in the ISI itself. The use 
of automatically generated ping packets ensures that ISISlaves will be given access to the ISI bus with a 
prograihmable minimum guaranteed frequency in addition to whenever it is idle. Five registers facilitate 
the automatic generation of ping messages within the ISI: PingScheduleO, PingSchedulel, PingSchedule2, 
ISITotalPeriod and ISILocalPeriod, Auto-pinging can be enabled or disabled by writing to the AutoPin- 
gEnable bit of the ISICntrl register. 

Each bit of the 14-bit PingScheduleN register corresponds to an IS lid that is used in the Address field of 
the ping packet and a 1 in the bit position indicates that a ping packet is to be generated for that ISIId. A 0 
in any bit position will ensure that no ping packet is generated for that ISIId. As ISISlaves may differ in 
their bandwidth requirement (particularly if a storage SoPEC is present) three different PingSchedule reg- 
isters are used to allow an ISISlave receive up to three times the number of pings as another active 
ISISlave. When the ISIMaster is not sending long packets (sourced from either the CPU or USB in the 
case of a SoPEC ISIMaster) ISI ping packets will be transmitted according to the pattern given by the three 
PingScheduleN registers. The ISI will start with the Isb of PingScheduieO register and work its way from 
Isb through msb of each of the PingScheduleN TC^steis. When the msb of PingSchedule2 is reached the 
ISI returns to the Isb of PingScheduieO and continues to cycle through each bit position of each Ping- 
Sch eduleN register. 

With the addition of auto-ping operation we now have three potential sources of packets in an ISIMaster 
SoPEC: USB, CPU and auto-ping. Arbitration between the CPU and USB for access to the ISI is handled 
outside the ISI (see section 12.7.7) but arbitration between auto-ping packets and CPU/USB originating 
packets, which we will refer to as local packets, happens within the ISI. To ensure that local packets get 
priority whenever possible and that ping packets can have some guaranteed access to the ISI we use two 4- 
bit counters whose reload value is contained in the ISITotalPeriod and ISlLocalPeriod registers. As we will 
see In 12.6.4.1 every ISI transaction is initiated by the ISIMaster transmitting cither a long packet or a ping 
packet. The ISITotalPeriod counter is decremented for every ISI transaction when contention occurs (i.e. 
both a ping and a local packet wish to transmit) while the ISlLocalPeriod counter is decremented for every 
local packet that is transmitted. Neither counter is decremented by a retransmitted packet. 

The amount of guaranteed ISF bandwidth allocated to both local and ping packets is determined by the val- 
ues of the ISITotalPeriod and ISlLocalPeriod registers. Local packets will always be given priority when 
the ISlLocalPeriod counter is non-zero. Ping packets will be given priority when the ISlLocalPeriod 
counter is zero and the ISITotalPeriod counter is still non-zero. Both the ISITotalPeriod and ISlLocalPe- 
riod counters are reloaded by the next focal packet transmit request after the ISITotalPeriod counter has 
reached zero. This reload policy minimises the maximum latency for ping packets at the expense of maxi- 
mum latency for local packets. 

Note that ping packets are quite likely to get more than their guaranteed bandwidth as they will be trans- 
mitted whenever the ISI bus is idle (i.e. no pending local packets) and so do not decrement cither counter. 
Local packets on the other hand will never get more than their guaranteed bandwidth because each local 
packet transmitted decrements both counters. The difference between the values of the ISITotalPeriod and 
ISILocalPeriod Tegisters determines the number of automatically generated ping packets that arc guaran- 
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teed to be transmitted every ISITotcdPeriod number of ISI transactions. If the ISITotalPeriod and ISILo- 
calPeriod values are the same then the local packets will always get priority and could totally exclude ping 
packets if the CPU always has packets to send. 

For exan^le if ISITotalPeriod = OxC; ISILocalPeriod =» 0x8; PingScheduleO = 0x07; PingSchedulel = 
0x06 and PingSchedule2 » 0x04 then four ping messages are guaranteed to be sent in every 12 ISI transac- 
tions. Furthermore ISUd3 will receive 3 times the number of ping packets as ISIdl and IS!Id2 will receive 
twice as many as ISIdl, Thus over a period of 36 contended ISI transactions (allowing for two fiill rota- 
tions through the three PingScheduleN registers) when local packets are always pending 24 local packets 
will be sent, ISIdl will receive 2 ping packets, ISId2 will receive 4 pings and ISId3 will receive 6 ping 
packets. If local traffic is less frequent then the ping frequency will automatically adjust upwards to con- 
sume all idle ISI bandwidth. 

f 2.6.4. 11 IISI Registers 

Table 30 below details the ISI configuration registers. Note that some of these registers are also used by 
other blocks in the SCB. 



Table 30. ISf configuration registers 









^^^^ 


5fl 












0x00 


ISICntrt 


5 


0x2 


ISI Control register 


0x04 


ISIId 


4 


0x1 


ISIld for this SoPEC. A value of 0 indicates the 
device is an ISI Master. Note that the SoPEC resets 
to being an ISI Stave and that OxF (the broadcast 
ISlid) is an illegal value and should not be written to 
this register. 


0x08 


NumRetries 


4 


0x02 


Number of retransmissions to attempt in response io 
a NAK before aborting a long packet transmission 


OxOC 


ISIPingScheduteO 


14 


0x0000 


Denotes which ISI Ids will be receive ping packets. 
Note that bitO refera to ISIIdl . biti to lSlld2...bit13 to 
ISIld14. 


0x10 


ISIPingSchedulel 


14 


0x0000 


As per PingSchedulaO 


0x14 


ISIPingSchedule2 


14 


0x0000 


As per PingSchedu!eO 


0x18 


ISITotalPeriod 


4 


OxF 


Reload value of the ISITotalPeriod counter 


0x1 C 


ISILocalPeriod 


4 


OxF 


Reload value of the ISILocalPeriod counter 


0x20 


ISIStatus 


6 


0x00 


ISI Status register This register is Readonly. 


0x24 


tSlMask 


6 


0x00 


ISI Interrupt Mask register 


0x30 - 0x4C 


CPUISITxBuff 


32 


n/a 


32-byte CPUISI transmit buffer 


0x50 


CPUISITxBuffCntrl 


13 


0x0000 


Control register for the CPUISI transmit buffer 


0x60 - 0x7C 


CPUlSIRxBuff 


32 


n/a 


32-byte ISI receive buffer. This is the half of the dou- 
ble buffer that contains the oldest data. 


0x80 


ISIRxBuffDest 


1 


0x0 


Only one of the CPU and the DMA manager is 
allowed to empty the receive buffer at any time. 
1 B CPU will empty the receh/e buffer 
0 = DMA manager will empty the receive buffer 



12.6.4.11.1 ISI control register 

The ISICntrl register is described in Table 31 below. Note that the reset value of this register allows the 
SoPEC to automatically become an ISIMaster (AutoMasterEnable *• 1) if any USB packets are received on 
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endpoints 2-4. On becoming an ISIMaster the ISIld register is set to 0, the TxEnable bit of the ISlCntrl reg- 
ister is set and any USB or CPU packets destined for other ISI devices are transmitted The CPU can over- 
ride this c^ability at any time by clearing the AutoMasterEnable bit Automatic ping operarion can only 
be enabled by the CPU as the reset values of the PingScheduleN iz^^m are all 0 and neither DMA chan- 
nel is automatically configured. 



Table 31. ISlCntrl register 









TxEnable 


0 


EnabJes IS) transmfssion of long or ping packets. This is cleared by 
transmit errors and so needs to be restarted by the CPU. Note that 
ACKs may still be transmitted when this bit is 0. 
1 a Transmission enabled 
0 o Transmission disabled 


RxEnable 


1 


Enables ISI reception. This Is cleared by receive errors and so 
needs to be restarted by the CPU. 
1 = Reception enabled 
0 s Reception disabled 


AutoPin9Enat)le 


2 


Enables auto>ptng operation 
1 3 auto-ping enabled 
0 - auto-ping disabled 


AutoMasterEnable 


3 


Enables the device to automatically become the ISIMaster if activ- 
ity is detected on USB 6ndpoints2-4. 

1 = autOTfnaster operation enabled 
0 = auto-master operation disabled 



12.6.4.11.2 ISI status register 

The ISIStatus register is read-only to the CPU. Stams bits are set by the relevant condition occurring and 
are cleared by writing to either the TxEnable or RxEnable bits of the ISlCntrl register or the CPUISITx- 
Buff. 



Table 32. ISIStatus register 









FrameError 


0 


Framing error detected In the received packet. This can be caused 
by an incorrect Start or Stop field or by bit stuffing errors 


RxError 


1 


A CRC error or flow control condition was detected in NumRe- 
frres+l successive packets (exduding ping packets) 


RxBuffFull 


2 


There is no space remaining in the receive double buffer 


RxBuffOverflow 


3 


An overflow has occurred in the ISI receive buffer and a packet had 
to be dropped. 


CPUlSrrxBuffEmpty 


4 


The CPUISITxBuff Is empty 


TxError 


5 


Transmission error. Receiving ISI device would not accept the 
transmitted packet. Only set after NumRetries unsuccessful 
retransmissions (excluding ping packets). 
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12.6 A1 1 .3 iSi mask register 

An intemipt will be generated in an edge sensitive manner i.e the ISI will generate an isijcujirq pulse 
each time a status bit goes high and the corresponding bit of the ISIMask register is enabled. 



Table 33. ISIMask register 









FrameErrorlnlEn 


0 


tnterrupt enable mask bit tor the FrameError status bit 


RxErrorlntEn 


1 


Interrupt enable mask bit for the RxError status bit 


RxButtFunintEn 


2 


Interrupt enable mask bit for the RxBuffFull status bit 


RxBuffOverflowlntEn 


3 


Interrupt enable mask bit for the RxBuffOverf!ow status bit 


CPUlSrTxBuffEmpty- 
IntEn 


4 


Interrupt enable mask bit for the CPUISITxBuffEmpty status bit 


TxErrorlntEn 


5 


Interrupt enable mask bit for the TxError status bit 



12.6.4.11.4 CPUISlTxBuffCntrl register 

The CPUISlTxBuffCntrl register contains the header field for the packet in the CPUISI transmit buffer. 
Writing to this buffer validates the contents of the CPUISI transmit buffer i.e. each time the CPU places a 
packet in the CPUISI transmit buffer it must write the packet header to this register to initiate its transfer in 
to the SCB transmit buffer (see section 12.7). Note that the CPU is responsible for toggling the sequence 
bit of any long packets it wishes to transmit. The CPUISITxBuffEmpty status bit will be set when CPUTx- 
PktSize bytes have been transferred to the SCB transmit buffer. 



Table 34. CPUISlTxBuffCntrl register 







PktDesc 


2:0 


PktDesc field (as per Table 29) for the packet currently In the CPU- 
ISI transmit buffer. 


DestlSISubId 


3 


Indicates which OMAChannel of the target SoPEC the data In the 
CPU IS t transmit buffer is destined for; 

0 = OMAChanneiO 

1 = DMAChannell 


DestlSild 


7:4 


Denotes the ISIId of the target SoPEC as per Table 35 



12.7 SCB Mapping 



In order to support maximum flexibility when moving data through a multi-SoPEC system it is possible to 
map any USB endpoint onto cither DMAChanncl within any SoPEC in the system. A logical view of the 
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SCB is shown in Figure 37. This view differs from the likely implementation but it allows for a clearer 
depiction of data movement within the SCB. 
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Figure 37. SCB logical view 

The SCB map, and indeed the SCB itself is based around the concept of an ISIId and an ISISubld. Each 
SoPEC in the system has a unique ISIId and two ISISubIds» namely ISISubldO and ISISubld 1. We use the 
convention that ISISubldO corresponds to DMAChannelO in each SoPEC and ISISubldl corresponds to 
DMAChannell. The naming convention for the ISDd is shown in Table 35 below and this would corre-. 
spond to a multi-SoPEC system such as that shown in Figure 27. We use the term ISIId instead of SoPE- 
Cld to avoid confusion with the unique ChipID used to create the SoPECJd and SoPEC_id_key (see 
I chapter 1 7 and [9] for more details). 



Table 35. ISIId naming convention 







is 






0 


ISIMaster (typically a SoPEC connected to the host via US81.1) 


1 -14 


ISISlave1-14 


15 


Broadcast ISIId 



The combined ISIId and ISISubld therefore allow us to address any DMAChannel in the system. The ISI, 
DMA manager and SCB map hardware use the ISIId and ISISubld to handle the different data streams that 
are active in a multi-SoPEC system as does the software running on the CPU of each SoPEC. In this docu- 
ment we will identify DMAChannels as ISIx.y where x is the ISIId and y is the ISISubld. Thus ISI2.1 
refers to DMAChannell of ISISlave2. Any data sent to a broadcast channel, i.e. IS115.0 or ISI15.1, are 
received by every ISI device in the system including the ISIMaster (which may be an ISI-Bridge). 

The USB controller and software stacks however have no understanding of the ISIId and ISISubld but the 
Silverbrook printer driver software running on the host PC does make use of the ISIId and ISISubld. USB 
is simply used as a data transport « the mapping of USB endpoints onto ISIId and Subid is communicated 
from the host PC Silverbrook code to the SoPEC Silverbrook code through USB control (or possibly bulk 
data) messages i.e. the mapping information is simply data payload as far as USB is concerned. The code 
running on SoPEC is responsible for parsing these messages and configxiring the SCB accordingly. 
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The use of just two DMAChaimels places some limitations on what can be achieved without software 
intervention. For eveiy SoPEC in the system there are more potential sources of data than there are sinks. 
For example an ISISlave could receive both control and data messages from the ISIMaster SoPEC in addi- 
tion to control and data from the host, either specifically addressed to that particular ISISlave or over the 
broadcast IS I channel. However all ISIS laves only have two possible data sinks, i.e. the two DMAChan- 
nels. Another example is the ISIMaster in a multi-SoPEC system which may receive control messages 
frorh each SoPEC in addition to control and data information from the host (e.g. over USB). In this case all 
of the control messages are in contention for access to DMAChanneiO. We resolve these potential conflicts 
by adopting the following conventions: 

1) Control messages may be interleaved in a memory buffer: The memory buffer that the 
DMAChanneiO points to should be regarded as a central pool of control messages. Every control 
message must contain fields that identify the size of the message, the source and the destination of 
the control message. Control messages may therefore be multiplexed over a DMAChannel which 
allows several control message sources to address the same DMAChaimel. Furthermore, if SoPEC- 
type control messages contain source and destination fields it is possible for the host to send control 
messages to individual SoPECs over the ISI15.0 broadcast channel. 

2 ) Data messages should not be interleaved in a memory buffer: As data messages are typically 
part of a much larger block of data that is being transferred it is not possible to control their contents 
in the same manner as is possible with the control messages. Furthermore we do not want the CPU 
to have to perform reassembly of data blocks. Data messages from different sources cannot be inter- 
leaved over the same DMAChannel - the SCB map must be reconfigured each time a different data 
source is given access to the DMAChannel. 

3 ) Every reconfiguration of the SCB map requires the exchange of control messages: The only 
active SCB map in a multi-SoPEC system is the SCB map in the ISIMaster as all ISISlaves auto- 
matically send data addressed to themselves to either DMAChanneiO or 1 i.e. the ISI is the only 
source of incoming data in an ISISlave. The ISIMaster's SCB map reset state is shown in Figure 39 
and any subsequent modifications require the exchange of control messages between the ISIMaster 
and the host. As the host is expected to control the movement of data in any SoPEC system it is 
anticipated that all changes to the SCB map will be performed in response to a request from the 
host. While the ISIMaster could autonomously reconfigure the SCB map (this is entirely up to the 
software running on the ISIMaster) it should not do so without infomiing the host in order to avoid 
data being misrouted. 

An example of the above conventions in operation is worked through in section 12.7.2. 



When considering SCB map configurations we always assxmie that the ISIMaster is a SoPEC device, in 
particular the SoPEC connected to the USB bus (and receiving data on USB endpoint 2, 3 or 4), rather than 
an ISI-Bridge chip. ISI-Bridge chips are likely to have something similar to an SCB map and the following 
information should broadly apply to an ISI-Bridge but we focus here on an ISIMaster SoPEC for clarity. 

As the ISIMaster SoPEC represents the printer on the USB bus it is required by the USB specification to 
have a dedicated control endpoint, EPO. At boot time the ISIMaster SoPEC will also require a bulk data 
endpoint to facilitate the transfer of program code from the host PC. The simplest SCB map configuration, 
i.e. for a single stand-alone SoPEC, is sufficient for host to ISIMaster SoPEC communication and is shown 
in Figure 38. In this configuration all USB control information exchanged between the host and SoPEC 
over EPO (which is the only bidirectional USB endpoint). SoPEC specific control information (printer sta- 
tus, DNC info etc.) is also exchanged over EPO. 

All packets sent to the host fi-om SoPEC over EPO must be written into the EPO FIFO by the CPU, All 
packets sent firom the host to SoPEC can be placed in DRAM by the DMA Manager (as is usually the 
case) or read directly by the CPU. This asynmietry is because in a multi-SoPEC environment the CPU will 



12.7.1 



Host PC to ISIMaster SoPEC communication 
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S5 



need to examine all incoming control messages (i.e. messages that have arrived over DMAChannelO) to 
ascertain their source and destination (i.e. they could be from an ISISlave and destined for the host) and so 
the additional overhead in having the CPU move the short control messages to the EPO FIFO is relatively 
small. Furthermore we wish to avoid making the SCB more complicated than necessary, particularly when 
there is no significant performance gain to be had as the control traflfic will be relatively low bandwidth. 

The above mechanisms are appropriate for the types of communication outlined in sections 12.4.1,1 
through 12.4.1.4 
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Figure 38. Single SoPEC SCB map configuration and dataflow 



12.7.2 Broadcast communication 

An SCB configuration for broadcast communication is shown in Figure 39. This particular configuration is 
also the default, post power-on reset, configuration for the ISIMaster SoPEC. USB endpoints EP2 and EP3 
are mapped onto ISISublDO and ISISubldl of ISIIdlS (the broadcast ISIId channel). EPO is used for con- 
trol messages as before and EPl is a bulk data cndpoint for the ISIMaster SoPEC Depending on what is 
convenient for the boot loader software, EPl may or may not be used during the initial program download, 
but EPl is highly likely to be used for compressed page or other program downloads later. For this reason 
it is part of the default configuration. In this setup the USB device configuration will take place, as it 
always must, by exchanging messages over the control channel (EPO). 

One possible boot mechanism is where the host PC sends the bootloaderl program code to all SoPECs by 
broadcasting it over EP3. Each SoPEC in the system then authenticates and executes the bootloaderl pro- 
gram. The ISIMaster SoPEC then polls each ISISlave (over the ISIx.O channel). Each ISISlave ascertains 
its ISIId by sampling the particular GPIO pins required by the bootloaderl and reporting its presence and 
status back to the ISIMaster. The ISIMaster then passes this information back to the host over EPO. Thus 
both the host and the ISIMaster have knowledge of the number of SoPECs. and their ISIIds, in the system. 
The host may then reconfigure the SCB map to better optimise the SCB resources for the particular multi- 
SoPEC system. This could involve simplifying th& default configuration to a single SoPEC system (Figure 
38) or remapping the broadcast channels onto DMAChannels in individual ISISlaves. 
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Figure 39. Default SoPEC SCB map configuration and dataflow 



The following steps are required to reconfigure the SCB map from the system depicted in Figure 39 to one 
where EP3 is mapped onto ISIl.O: 

1) The host PC sends a control message(s) to the ISIMaster SoPEC requesting that USB EPS be 
remapped to ISIl .0 

2) The ISIMaster SoPEC sends a control message to the host PC informing it that EP3 has now been 
mapped to ISIl.O (and therefore the host knows that the previous mapping of ISI15.1 is no longer 
available through EPS). 

3 ) The host may now send control messages directly to ISI Slave 1 without requiring any CPU interven- 
tion on the ISIMaster SoPEC 



12.7.3 Host PC - ISISIave SoPEC communication 

The de&ult post-boot (as opposed to post-reset) SCB map configuration for an ISISIave SoPEC is to have 
all USB endpoints unconnected The ISI automatically forwards any data addressed to it (including broad- 
cast data) to the DMA with the appropriate ISISubld. If the ISIMaster is configured correctly (e.g. when 
the ISIMaster is a SoPEC, and that SoPEC *s SCB map is configured correctly) then data sent from the host 
destined for an ISISIave will be transmitted on the ISI with the correct address. If the ISISIave has data to 
send to the host it must do so by sending a control message to the ISIMaster identifying the host as the 
intended recipient. It is then the ISIMaster's responsibility to forward this message to the host. 

With this configuration the host can communicate with the ISISIave via broadcast messages only and this 
is the niechanism by which the bootloaderl program is downloaded. The ISISIave is unable to communi- 
cate with the host (or the ISIMaster) until the bootUoaderl program has successfully executed and the 
ISISIave has determined what its ISIId is. After the bootloaderl program (and possibly other programs) 
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has executed the SCB map of the ISIMaster may be reconfigured to reflect the most appropriate topology 
for the particular multi-SoPEC system it is pan of. 

All communication from an ISISlave to host is achieved by sending messages via the ISIMaster. The 
ISrSlave can never initiate conrununication to the host. If an ISISlave wishes to send a message to the host 
it may do one of two things: (a) wait until it is polled by the ISIMaster or (b) indicate in its ISI acknowl- 
edgement packet (sent in response to the reception of an ISI packet specifically addressed to that ISISlave) 
that it has a message to send. When the ISIMaster receives the message from the ISISlave it first examines 
it to detcnnine the intended destination and will then copy it into the EPO FIFO for transmission to the 
host. The software running on the ISIMaster is responsible for any arbitration between messages from dif- 
ferent sources (including itselQ that are all destined for the host. 

The above mechanisms are appropriate for the types of communication outlined in sections 12.4.1 5 and 
12.4.1.6. 

1 2.7.4 ISIMaster - ISISlave communication 

All ISIMaster - ISISlave communication takes place over die ISI. Immediately after reset this can only be 
by means of broadcast messages. Once the bootloaderl program has successfiilly executed on all SoPECs 
in a multi-SoPEC system the ISIMaster can communicate with each SoPEC on an individual basis. 

If an ISISlave wishes to send a message to the ISIMaster it may do so in response to a ping packet from the 
ISIMaster. When the ISIMaster receives the message from the ISISlave it must interpret the message to 
determine if the message contains information required to be sent to the host. In the case of the ISIMaster 
being a SoPEC. software will transfer the £q}propnate information into the EPO FIFO for transmission to 
the host. 

The above mechanisms are appropriate for the types of communication outlined in sections 12.4.2.3 and 
12.4.2.4. 

12.7.5 ISISlave - ISISlave communication 

ISISlave to ISISlave communication is expected to be limited to two special cases: (a) when the PrintMas- 
ter is not the ISIMaster and (b) when a storage SoPEC is used. When the PrintMaster is not the ISIMaster 
then it will need to send control messages (and receive responses to these messages) to other ISISIaves. 
When a storage SoPEC is present it may need to send data to each SoPEC in the system. All ISISlave to 
ISISlave communication will take place in response to ping messages from the ISIMaster. 

12.7.6 SCB Map configuration registers 

The SCB map is configured by mapping a USB endpoint on to a data sink. This is performed on a endpoint 
basis i.e. each endpoint has a configuration register to allow its data sink be selected Mapping an endpoint 
on to a data sink does not initiate any data flow - each endpoint/data sink needs to be enabled by writing to 
the appropriate configuration registers in the USB controller/ ISI / DMA manager. 



Table 36. SCB Map configuration registers 







f 








0x100 


USBEPODest 


7 


0x20 


This register determines which of the data sinks the 
data arriving in EPO should t>e routed to. 


0x104 


USBEPIDest 


7 


0x21 


Data sink for USB EP1 


0x108 


US8EP2Dest 


7 • 


0x3E 


Data sink for USB EP2 
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Table 36. SCB Map configuration registers 











0x1 OC 


USBEP3Dest 


7 


0x3F 


Data sink for USB EPS 


0x110 


USBEP4Dest 


7 


0x23 


Data sink tor USB EP4 



The same encoding is used for each of the USBEPnDest configuration registers and is described in 
Table 37. The ISIId register (see Table 30) allows the SCB map to identify data that should be routed to the 
DMA Manager as well as, or instead of, to the ISL The SCB map therefore does not need special fields to 
identify the DMAChannels on the ISIMaster SoPEC as this is taken care of by the SCB hardware. Thus the 
USBEPODest and USBEPIDest registers should be programmed with 0x20 and 0x21 (for ISIO.O and 
ISIO. 1) respectively to ensure data arriving on these endpoints is moved directly to DRAM, 



Table 37. USBEPnDest register 









DestlSISubId 


0 


Indicates whk:h DMAChannd of the target SoPEC the endpoint is 
mapped onto: 

0 = DMAChannelO 

1 » DMAChannell 


DestlSltd 


4:1 


Denotes the ISlId of the target SoPEC as per Tat)le 35 


ChanneiEn 


5 


Enable bit for the DMAChannel: 

0 s Channel disabled 

1 = Channel enabled 


SequenceBit 


6 


Sequence t»t for packets going from USBEPn to OestlSMd.Oestl- 
SiSubid. Every CPU write to this register initialises the value of the 
sequence bit and this is subsequently updated by the ISI after 
every successful long packet transmission. 



I A SoPEC ISIMaster should map as many USB endpoints, under the control of the host, as are required for 

the multi-SoPEC system it is part of. As already mentioned this mapping may be dynamically reconfig- 
ured. 

12.7.7 SCB transmit buffer arbitration 

When the SCB transmit buffer has been ennptied the SCB control logic will iminediately seek to refill it. 
As there may be data waiting in a USB endpoint FIFOs and in the CPUISI transmit buffer it may be neces- 
sary to arbitrate between these data sources. This arbitration is controlled by the SCBTxBuffArb register 
which contains a high priority bit for both the CPU and the USB. If only one of these bits is set then the 
corresponding source always has priority Note that if the CPU is given absolute priority over the USB the 
software foiling the CPUISI transmit buffer needs to ensure that sufficient USB traffic is allowed through. 
If both bits of the SCBTxBuffArb have the same value then arbitration will take place on a round robin 
basis. 

As the speed at which the SCB transmit buffer can be emptied is at least 5 times greater than it can be filled 
by USB traffic the double buffers used for each USB endpoint will not overflow using the above scheme in 
normal operation. There are a number of scenarios which could lead to the USB endpoints being tempo* 
I rarily blocked such as the CPU having priority, retransmissions on the ISI bus, channels being enabled (cf. 

the ChanneiEn bit of the USBEPnDest register) with data already in their associated endpoint FIFOs or 
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short packets being sent on the USB. Care should be taken to ensure that the USB bandwidth is efficiently 
utilised at all times. 



12.7.8 SCB Control Block 

The SCB control block is responsible for coordinating access to and between the various sub-blocks in the 
SCB. This includes translating between the CPU subsystem bus and the USB native bus protocol, moving 
data from the USB endpoint FIFOs into the SCB transmit buffer, moving data from the CPUTSI transmit 
buffer into the SCB transmit buffer and arbitrating between the CPU and itself for access to the SCB sub- 
blocks. 



Table 38. SCB control block configuration registers 









1^ 
rm 




m 




0x120 


Wakeup&iatole 


2 


0x0 


This register is used to gate the propagation of the 
USB and (SI reset signals to the CPR btock. Active 
high. 

WakeUpEnabte(0]: u&b_cpr^faset_n control 
WakeUpEnable[1 ]: isi_cpr_reset_n control 


0x124 


SCBTxBuffArb 


2 


0x0 


Determines which source has priority when conten- 
tion arises in filling the SCBTxBuffer. When a bit is 
set priority is given to the relevant source. 
SCBTxBuffArb{0): CPU priority 
SCBTxBuffArb[1): USB priority 


0x128 


SCBDebugSel 


10 


0x000 


Contains address of the register selected for debug 
observation as it wouid appear on cpu.adr(1 1:2] 
The contents of the selected register are output in 
the scb^cpu^data bus while cpu^scb^sef ts low and 
scb_cpu_debug_va!id \s asserted to indicate the 
debug data is valid. 

It is expected that a number of pseudo-registers will 
be made available for debug observation and these 
win be outlined with the implementation details. 



12.8 DMA Manager 

The DMA manager manages the flow of data between the SCB and the embedded DRAM. Whilst the 
CPU could be used for the movement of data in a USBl . I enabled SoPEC a DMA manager is a more effi- 
cient solution as it will handle data in a more predictable fashion with less latency and requiring less buff- 
ering. Furthermore a DMA manager is required to support the ISI transfer speed and to ensure that the 
SoPEC could be used with a high speed ISI-Bridge chip in the future. 

The DMA manager uses t\yo independent channels, one for each ISISubId, to control the movement of 
data. Both DMAChannels only support write operation and can transfer data from any USB endpoint and 
from the ISI receive buffer. Data is moved at the soonest opportunity to do so and is always moved in 256- 
bit slices as required by the DIU. When it is not possible to use a 256-bit slice of data (e.g. at the end of a 
packet or for a short packet) the DMA manager will still use 256-bit access to the DIU. This means that for 
a DIU write (data incoming to the SoPEC) the DMA manager will pad the valid data with zeroes until a 
256-bit slice has been filled. 

The DMA manager handles all. issues relating to byte/word/longword address alignment, data endianness 
and transaction scheduling. It arbitrates between data arriving from.the ISI and data arriving from a USB 
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endpoint on a round robin basis. The greater guaranteed bandwidth available to the DMA manager (50 
I Mbit/s at the time of writing but this may need to be increased especially if a 4-wire ISI bus is used. See 

section 20.6 for more details) ensures that the DMA manager is non-blocking. 

While the DMA manager performs the work of moving data the CPU controls the destination and relative 
timing of dataflows to and from the DRAM. The management of the DRAM data buffers requires the CPU 
to have accurate and timely visibility of both the DMA and PEP memory usage. In other words when the 
PEP has completed processing of a page band the CPU needs to be aware of the fact that an area of mem- 
ory has been freed up to receive incoming data. The management of these buffers may also be performed 
by the host. 

12.8.1 Circular buffer operation 

The DMA manager supports the use of circular buffers for both DMAChannels. Each circular buffer is 
controlled by 5 registers: DMAnBottomAdn DMAnTopAdr, DMAnMaxAdr, DMAnCurrfVPtr md DMAnln- 
tAdn The operation of the circular buffers is shown in Figure 40 below. 




(a) (b) 

Key: | ~[ Free buffer space 

IW/Sl Filled buffer space (unprocessed data) 

N>vS Buffer space filled since last write to the DMAnlntAdr/DMAnMaxAdr registers 

Figure 40. Circular buffer operation 

Here we sec two snapshots of the status of a circular buffer with 0>) occurring sometime after (a) and some 
CPU writes occurring in between (a) and (b). These CPU writes are most likely to be as a result of a fin- 
ished band intemapt (which frees up buffer space) but could also have occurred in a DMA interrupt service 
routine resulting from DMAnlntAdr being hit. The DMA manager will continue filling the free buffer 
space depicted in (a), advancing the DMAnCurr fVPtr after each write to the DIU. Note that the DMACur- 
rWPtr register always points to the next address the DMA manager will write to. When the DMA manager 
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reaches the address in DMAnlntAdr (i.e. DMACurrWPtr «= DMAnlntAdr) it will generate an interrupt if the 
DMAnlntAdrMask bit in the DMAMask register is set. The purpose of the DMAnlntAdr register is to alert 
the CPU that data (such as a control message or a page or band header) has arrived that it needs to process. 
The interrupt routine servicing the DMA interrupt will change the DMAnlntAdr value to the next location 
that data of interest to the CPU will have arrived by. 

In the scenario shown in Figure 40 the CPU has determined (most likely as a result of a finished band 
intemipt) that the filled buffer space in (a) has been freed up and is therefore available to receive more 
data. The CPU therefore moves the DMAnMaxAdr to the end of the section that has been freed up and 
moves the DMAnlntAdr address to an appropriate offset from the DMAnMaxAdr address. The DMA man- 
ager continues to fill the free buffer space and when it reaches the address in DMAnTopAdr it wraps around 
to the address in DMAnBottomAdr and continues firom there. DMA transfers will continue indefinitely in 
this fashion until the DMA manager reaches the address in the DMAnMaxAdr register. 

The circular buffer is initialised by writing the top and bottom addresses to the DMAnTopAdr and DMAn- 
BottomAdr registers, writing the start address (which does not have to be the same as the DMAnBottomAdr 
even though it usually will be) to the DMAnCurrWPtr register and appropriate addresses to the DMAnln- 
tAdr and DMAnMaxAdr registers. The DMA operation will not commence until a 1 has been written to the 
relevant bit of the DMAChanEn register. 

While it is possible to modify the DMAnTopAdr and DMAnBottomAdr registers after the DMA has started 
it should be done with caution. The DMAnCurrWPtr register should not be written to while the 
DMAChannel is in operation. DMA operation may be stalled at any time by clearing the appropriate bit of 
the DMAChanEn register or by disabling an SCB mapping or ISI receive operation. 

12.8.2 DMA manager DRAM bandwidth requirements 

The DIU must guarantee the SCB enough bandwidth to ensure that neither a USB endpoint FIFO nor the 
ISI receive buffer can overrun. For example, to facilitate bursty 32 Mbit/s transfers a SoPEC with a 64- 
byte ISI receive buffer would need to be able to transfer 256 bits every 1280 cycles (@160 MHz). This is 
in addition to the USB transactions targeted at the ISIMaster SoPEC which may be in the region of 8-9 
Mbit/s. While USB has a backpressure mechanism SoPEC should strive to obtain optimum USB band- 
width utilization and so USB backpressuring should only be used as a last resort. The DIU currently guar- 
antees 50 Mbit/s to the SCB and more bandwidth will be available when other DIU requestors do not take 
their slots. This is sufficient for the SCB^s requirements. 

12.8.3 DMA manager configuration registers 

All of the circular buffer registers are 256-bit word aligned as required by the DIU. The DMAnBottomAdr 
and DMAnTopAdr registers are inclusive i.e. the addresses contained in those registers form part of the cir- 
cular buffer.The DMAnCurrWPtr always points to the next location the DMA manager will write to so 
interrupts are generated whenever the DMA manager reaches the address in either the DMAnlntAdr or 
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DMAnMaxAdr registers rather than when it actually writes to these locations, it therefore cannot write to 
the location in the DMAnMaxAdr register. 



Table 39. DMA Manager Configuration Registers 




0x200 



OMAOBottomAdr 



17 



0x0^0000 



The 25Q'b\\ aligned DRAM address of the 
bottom of the drcular txiffer serviced by 
DMAChannelO 



0x204 



DMAOTopAdr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
top of the circular buffer serviced by 
DMAChannelO 



0x208 



DMAOCurrWPtf 



17 



0x0.0000 



The 256-bit aligned ORAM address of the 
next location DMAChannelO will write to. This 
register is set by the CPU at the start of a 
DMA Operation and dynamicalty updated by 
the DMA rnanager during the operation. 



0x20C 



DMAOIntAdr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
location that will trigger an interrupt when 
reached by DMAChannelO buffer. 



0x210 



DMAOMaxAdr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
last free location In the DMAChannelO circu- 
lar buffer. The DMAChannelO transfers will 
stop when it reaches this address. 



0x214 



DMAOSeqBit 



0x0 



Sequence bit for DMAChannelO. This bit may 
be initialised by the CPU but Is updated tiy 
the ISt each time an error-free long packet is 
received. 



0x218 



DMAlBottomAdr 



17 



0x0^0000 



The 256-bit aligned DRAM address of the 
bottom of the drcular buffer serviced by 
DMAChannell 



0x21 C 



DMAlTopAdr 



17 



0x0_0O00 



The 256-blt aligned DRAM address of the 
top of the circular buffer serviced by 
DMAChannell 



0x220 



DMAICurrWPtr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
next location DMAChannell will write to. This 
register is set by the CPU at the start of a 
DMA operation and dynamically updated by 
the DMA manager during the operation. 



0x224 



DMAIlntAdr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
location that will trigger an interrupt when 
reached by DMAChannell txjffer. '■ 



0x228 



DMAIMaxAdr 



17 



0x0.0000 



The 256-bit aligned DRAM address of the 
last free location in the DMAChannell drcu- 
lar buffer. The DMAChannell transfers win 
stop when K reaches this address. 



0x22C 



DMAlSeqBit 



0x0 



Sequence bit for DMAChannell .This bit may 
be initialised by the CPU but is updated by 
the ISI each time an error-free long padcet is 
received. 



0x230 



DMAChanEn 



0x0 



Enable DMA operation on a per channel 
basis. Active high. 

DMAChanEn[0]: Enable DMAChannelO 
DMAChanEn[1]: Enable DMAChannell 
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Table 39. DMA Manager Configuration Registers 



0x234 



0x238 





DMA status reolster. See section 12.6^.1. 
Thfs register is Readonly, 



DMA mask register. See section 12.8.3.2 



12.8.3.1 DMAStatus register 

The contents of the DMAStatus register are read-only to the CPU. The sutus bits are not sticky bits i.e. 
they reflect the 'live' status of the channel. Status bits may only be cleared by writing to the relevant 
DMAnlruAdr or DMAnMaxAdr register. 



Table 40. DMA Status Register 









DMAChannerOtntAdrHit 


0 


OMAChannetO has reached the address contained in the 
DMAOIntAdr register 


DMAChannelOMaxAdrHit 


1 


DMAChannelO has reached the address contained in the 
DMAOMaxAdr register 


DMAChanneh IntAdrHit 


2 


DMAChanneil has reached the address contained in the 
DMA ItntAdr register 


OMAChannel 1 MaxAdrHit 


3 


DMAChanneil has reached the address contained in the 
DMA iMaxAdr register 



12.8.3.2 DMAMask register 

All bits of the DMAMask are both readable and writable by the CPU. The DMA manager cannot alter the 
value of this register.AU interrupts are edge sensitive i.e the DMA manager will generate a dmajcujtrq 
pulse each time a status bit goes high and the corresponding mask bit is enabled. 



Table 41. DMA Manager Mask Register 









OMAChannelOlntAdrHitMask 


0 


1 = Generate an Interrupt when the DMAChanneiOlntAdrHit status 
t)it goes high 

0 = Do not generate an interrupt when the DMAChannelOlntAdrHIt 
status bit goes high 


DMAChannelOMaxAdrHitMask 


1 


1 = Generate an interrupt when the DMAChannelOMaxAdrHit status 
bit goes high 

0 s Do not generate an interrupt when the DMAChannelOMaxAdrHit 
status bit goes high 


DMAChanneil IntAdrHitMask 


2 


As per DMAChannefOlhtAdrHltMask 


DMAChanneil MaxAdrHitMask 


3 


As per DMAChannetOMaxAdrHitMask 



12.9 SCB Implementation 

This section is still a work in progress - the information here should be ignored as it refers to an earlier ver- 
sion of the SCB 
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Characteristics of the data channels: 

USB: Packets should be moved sequentially out of the endpoint FIFOs. The USB is the slowest compo- 
nent in the SCB but its bandwidth is most precious. However both the DMA and ISI can transfer data (50 
and 40 Mbps respectively) much faster than the USB can receive data (12 Mbps peak rate) so no flow con- 
trol problems will occur due to a speed mismatch. If one of the DMA or ISI data sinks becomes blocked or 
inactive then the USB controller will assert backpressure (by NAKing packets) when the double buffer for 
the associated endpoint is filled. Other endpoints will remain active m this scenario and the DMA and ISI 
will still be able to transfer data at their peak rates. The worst case scenario is when all endpoints have 
their double buffers filled (because all the data sinks had been blocked/disabled) and then all data sinks 
become available again. In this case the backlog will be fully cleared in 3 USB 64-byte packet times. 

ISI: The ISI can support simultaneous reception and transmission of packets. ISI packets should be trans- 
ferred sequentially in either direction. The ISI is expected to handle the packet header and trailer, if any is 
used for error detection, in both directions i.e. only raw payload data is routed through the SCB map, 

DMA: The DMA channels are unidirectional but their direction, namely whether they are transferring 
data to or from DRAM, is programmable. Each DMA transaction to DRAM will be 256 bits wide but all 
256 bits are not always valid. When a transfer of less than 256 bits is required the DMA manager pads the 
remaining bits in the 256-bit word with zeroes, in the case of a write to DRAM, or discards the unnecces- 
saiy bits in the case of a DRAM read. Can we get by with single (256 bits each way or maybe even 256 
bits in all ?) buffering for the DRAM manager ? 
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dma scbs_data 



scbs_dma_data 



dma_scbs_cntrl 
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— > 
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— ^ 
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4r 
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usb_ep_rdy[2:0] 
— ^ 

usb_rx_data 
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Figure 41. SCB Switch block diagram 



SCB Switch pseudocode: 

const no^data^sinks = 12 

for i = 1 to no_data_sin]c8 
if (i <= 2) then 

sinK^dlata is dma^din 
sink_rdty is dma_din_rdy 
sink^data.valid is diiia_din_valid 
sink^id is dina_din_id 
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else 

8ink_daca is isi_tx_data 
sink^rdy is isi_t3?_rd{y 
sinkL_daca_vaIid is i8i.tx.data.valid 
sinK_id is i8i_tx.data_id 

i£ (data.src_reg(i) !» 0) then // Each data sink has an associated data source 

// register. A non-zero value means the sink is enabled 
if ((data.8rc.reg(i) & OxFO) == 0x10) then // A USB endpoint is the data source 
if ((usb.ep.rdyC4] == 1) AND (usb_ep_rdy [3 :0) == data_src_reg( ij [3 : 01 ) ) then 

// there is data waiting in the BP FIFO 
while ( (uab.data.valid « i> AND (aink^rdy == 1) AND clocktick) 
sin}cdata = u8b_rx_data 
sink^data.valid = 1 

if (i <= 2) then // The sink is a DMAChannel 
sink_id[ll = 1 
sink.id(0] ^ i -l 
else // The sink is an ISI channel 
sink_id[5) = 1 
sink_idt4:0) = i -1 
else // There is no data ready to go 
slnk^data_valid = 0 

elsif (data_src_reg & OxFO) == 0x20) then // The ISI is the data source 

if (isi_data_rdy_id[3:0) == data_src„reg[i) [3 :0] ) then // there is data waiting 

// in the ISI receive FIFO for this iSISubId 
while ( {isi_rx_data_valid == 1) AND (sink„rdy e= 1) AND clocktick) 
sink_data = isi_rx_data 
sink_data_valid = 1 

if <i <= 2) then // The sink is a raiAChannel 
sink_id(l) = 1 
sink_idC0) a i -1 
else // The sink is an ISI channel 
sink_id[5] = 1 
sink_id(4:0} = i -3 
else // There is no data ready to go 
sink^data_valid = 0 

elsif (data_src_reg 6 OxFO) == 0x30) then // The DMA is the data source 

if (dina_dout_rdy_idCOJ == data_src_reg( i ] CO} ) then // there is data waiting 

//in the relevant DMA buffer for this sink 
while ( (djT»a_dout_valid == 1) AND (sink_rdy == 1) AND clocktick) 
sink_data = dma^dout 
sink_data_valid = 1 

if <i <= 2) then. // The sink is a raiA channel 
sink_idll] = 1 
sink_id(0) » i -1 
else // The sink is «m ISI channel 
sink_idt51 = 1 
sink_id[4:01 = i -3 
else // There is no data ready to go 
sink_data_valid = 0 



The above pseudocode has a few shortcomings, particularly if all our data buses are not the same size, but 
it shows the basic functionality the switch is supposed to offer The main loop of the pseudocode (for i = 1 
to no_data_sinks) dictates what happens within one timeslot. The timeslots take as long as required to 
complete and loop around endlessly. The msb of the usb_ep_rdy[4:0], xsi_data_rdyjd[5:0] and 
dma_dout_rdy_id[l:0] signals is used to indicate that data is available in the relevant block. 
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13 General Purpose iO (GPIO) 

13.1 Overview 

The General Purpose IO block (GPIO) is responsible for control and interfacing of GPIO pins to the rest of 
the SoPEC system. It provides easily programmable control logic to simplify control of GPIO functions. 
In all there are 14 GPIO pins of which certain pins have special functions, their functions are detailed as: 

• 4 Motor control lOs internally pulled down 

• 4 General purpose high drive pulsed lOs capable of driving LEDs. 

• 4 Open drain lOs used for LSS interfaces 

• 2 Normal drive lOs used for the ISI interface in Multi-SoPEC mode 

Each of the pins can be configured in either input or output mode, each pin is independently controlled. A 
programmable de-glitching circuit exists for all input pins. Each input is a schmidt trigger to increase noise 
immunity should the input be used without the de-glitch circuit. The mapping of the above functions and 
their alternate use in a slave SoPEC to GPIO pins is shown in Table 42 below. 

Table 42. GPIO pin functfonaflty 



gpio[3:0] 


Motor control pins / genera! purpose JO 


gpio[7:4j 


LED driver pins / general purpose IO 


gpio[11:8l 


LSS interface pins / general purpose 10 


gpto[13:12} 


iSi interface pins / general purpose IO 



13.2 MOTOR CONTROL 

The motor control pins can be directly controlled by the CPU or the motor control logic can be used to 
generate the phase pulses for the stepper motors. The controller consists of two central counters from 
which the control pins are derived. The central counters have several registers (see Table 44) used to con- 
figure the cycle period, the phase, the duty cycle, and counter granularity. 

There are two motor master counters (0 and 1) with identical features. The period of the master counters 
are defined by the MotorMasterClkPeriod[l:0] and MotorMasterClkSrc registers i.e. both master counters 
are derived from the same MotorMasterClkSrc. The MotorMasterClkSrc defines the timing pulses used by 
the master counters to determine the timing period. The MotorMasterClkSrc can select clock sources of 
1 ^is, 1 00^s, 1 0ms and pclk timing pulses. 

The MotorMasterClkFeriodfL'OJ registers are set to the number of timing pulses required before the tim- 
ing period re-starts. Each master counter is set to the relevant MotorMasterClkPeriod value and counts 
down a unit each time a timing pulse is received. 

The master counters reset to MotorMasterClkPeriod value and count down. Once the value hits zero a new 
value is reloaded from the MotorMasterClkPeriod [1:0] registers. This ensures that no master clock glitch 
is generated when changing the clock period. 

Each of the IO pins for the motor controller are derived from the master counters. Each pin has indepen- 
dent configuration registers. The MotorMasterClkSelect[3:0] registers define which of the two master 
counters to lise as the source for each motor control pin. The master counter value is compared with the 
configured MotorCtrlHigh and MotorCtrlLow registers. If the count is equal to MotorCtrlHigh value the 
motor control is set to 1, if the count is equal to MotorCtrlLow value the motor control pin is set to 0. 

This allows the phase and duty cycle of the motor control pins to be varied at pclk granularity. 
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The motor control generators can be paused at the end of a clock period by setting the MotorMasterClock- 
Enable register to zero. This allows the CPU to re-configure the motor controller without causing a glitch 
on the oittput pins. 

13.3 LED CONTROL 

LED lifetime and brightness can be improved and power consumption reduced by driving the LEDs with a 
pulsed rather than a DC signal. The source clock for each of the LED pins is a 7.8kHz (128)is period) 
clock generated from the Ijis clock pulse from the Timers block. The LEDDutySelect registers are used to 
create a signal with the desired waveform. Uiq^ulsed operation of the LED pins can be achieved by using 
CPU 10 direct control. By default the LED pins are controlled by the LED control logic. 



Master Clock 



LEDDutySelect =rO 
LEDDutySelect -^ 
LEDDutySelect a2 
LEDDutySelect s;3 
LEDDutySelect s4 
LEDDutySelect a5 
LEDDutySelect =6 
LEDDutySelect z:7 



Figure 42. Duty Cycle Select 



13.4 LSS INTERFACE VIA GPIO 

In some SoPEC system configurations one or more of the LSS interfaces may not be used. Unused LSS 
interface pins can be reused as general lO pins by configuring the CpuIOCtrl register. When a bit in the 
CpulOCtrl is set the corresponding pin is controlled by the CPU registers, otherwise the pin is controlled 
by the LSS block. By default the LSS controls the GPIO pins 1 1 to 8. 

1 3.5 ISI INTERFACE VIA GPIO 

In Multi-SoPEC mode the SCB block (in particular the ISI sub-block) requires direct access to and from 
the gpio[I2] and gpiofiS] pins. Control of the ISI interface pins is detennined by the CpuIOCtrl register. 

When a bit in the CpuIOCtrl is set the corresponding pin is controlled by the CPU registers, otherwise the 
pin is controlled by the ISI block directly. By default the pins are directly controlled by the ISI block. 

In single SoPEC systems the pins can be re-used by the GPIO. 

13.6 CPU GPIO CONTROL 

The CPU can assume direct control of any (or all) of the ID pins individually. On a per pin basis the CPU 
can turn on direct access to the pin by setting the CpulOCtrl register. Once set the lO pin assumes the 
direction specified by the CpuIODirection register. When in output mode the value in register CpuIOOut 
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will be directly reflected to the output driver. When in input mode the status of the input pin can be read in 
either the direct version or a de-glitched fonn, by reading CpulOIn and CpuIOInDeglitch respectively. 
When writing to the CpuIOOut register the top bits of the register (bits 29 to 16) are used to filter access to 
the lower bits (13 to 0). 

13.7 Programmable de*glitching logic 

Each 10 pin can be filtered through a de-glitching logic circuit. The circuit can be configured to sample the 
lO pin for a predetermined time before concluding that a pin is in a particular state. The exact sampling 
length is codigurable» but each GPIO pin must use one of two possible configured values (selected by 
DeGlitchSelecf). The sampling length is the same for both high and low states. The DeGlitckCount is pro- 
grammed to the number of system time units that a state must be valid for before the state is passed on. 
The time units are selected by DeGlitchClkSel and can be one of 1 }is, 1 00^s,10ms and pclk pulses. 

For exan^le if DeGlitckCount is set to 10 and DeGlitchClkSel set to 3, then an input pin (one of gpioflS 
to OJ) must consistently retain its value for 10 system clock cycles (pclk) before the input state will be 
propagated from CpuIOIn to CpuIOInDeglitch, 

13.8 Interrupt generation 

Any of the GPIO pins can generate an interrupt from the raw or deglitched version of the input pin. There 
are 14 possible interrupt sources from the GPIO to the interrupt controller, one interrupt per input pin. The 
InterruptSrcSelect register determines whether the raw input or the deglitched version is used as the inter- 
rupt source. 

The interrupt type, masking and priority can be programmed in the interrupt controller. 

13.9 Frequency ANALYSER 

The frequency analyser measures the dviration between successive positive edges on an input pin and 
reports the last period measured {FreqAnctLastPeriod) and a running average period (FreqAnaAverage), 

The ruiming average is updated each time a new positive edge is detected and is calculated by 
FreqAnaAverage = ( FreqAnaAverage / 8 ) * 7 + FreqAnaLastPeriod I 8. 

The analyser can be used with any input pin (or its deglitched form), but only one pin at a time can be 
selected The pin is selected by the FreqAnaPinSelect and its deglitched form can be selected by 
FreqAnaPinFormSelect. 
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13.10 Implementation 

13.10.1 Deflnitions of I/O 



Table 43. I/O definition 









Clocks and Resets 




pclk 


1 


tn 


System Clock 


prst^n 


1 


In 


System reset, synchronous active tow 


tim^ulse(2:0) 


3 


In 


Timers block generated timing pulses. 

0 - 1 jis pulse 

1 • 100|ji$ pulse 

2 • 10 ms pulse 


CPU Interface 


cpu_addr[7:2] 


6 


In 


CPU address bus. Only 6 bits are required to decode the 
address space tor this block 


cpu_dataout(31 .-0) 


32 


In 


Shared write data bus from the CPU 


gp{o_cpu_data(31 :d] 


32 


Out 


Read data bus to the CPU 


cpu_rwn 


1 


In 


Common read/not-write signal from the CPU 


cpu_gplo_sel 


1 


In 


Block select from the CPU. When cpu^gpio^set is high both 

cpu_addr and cpu_dataout are yraWd 


gpio_cpu_rdy 


1 


Out 


Ready signal to the CPU. When gpio_cpu_fdy is high it indi- 
cates the last cycle of the access. For a write cycle this means 
cp(/_eXafiaouf has been registered by the GPIO block and for a 
read cyde this means the data on gpio_cpu_data is valid. 


gpio_cpu_berr 


1 


Out 


Bus error signal to the CPU Indicating an invalid access. 


gpio_cpu_debug_vaIid 


1 


Out 


Debug Data valid on gpio_cpu__data bus. Active high 


cpu_acode[1:0) 


2 


In 


CPU Access Code signals. These decode as follows: 

00 - User program access 

01 - User data access 

10 - Supervisor program access 

1 1 - Supervisor data access 


lO Pins 


gpio_o(13:0] 


14 


Out 


General purpose lO output to lO driver 


gploJI13:0l 


14 


In 


General purpose lO input from lO receiver 


gpio_e[13:0] 


14 


Out 


General purpose (O output control. Active high driving 


GPIO to LSS 




lss_gpio_do[1 :0] 


2 


In 


LSS bus data output 
Bit 0 - LSS bus 0 
Bit 1 - LSS bus 1 


gploj3s_di(l:0) 


2 


Out 


LSS bus data input 
BitO*LSSbusO 
Bit 1 - LSS bus 1 


lss_gplo_eI1 :0J 


2 


In 


LSS bus data output enable, active high 
Bit 0 - LSS bus 0 
Bit 1 - LSS bus 1 


lss_gpjo_clk(1 :0) 


2 


In 


LSS bus dock output 
Bit 0 - LSS bus 0 
Bit 1- LSS bus t 


GPIO to ISI 
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Table 43. I/O definition 







mil 




OpJo Jsi_din[1 :0] 


2 


Out 


input data from lO receivers to ISI. 


isLgpio_dout(1.-0] 


2 


In 


Data output from ISI to 10 drivers 


isi_Qpio_e[1:0} 


2 


fn 


GPIO ISI pins output enable (active high) from ISI Interfoce 


Interrupts 


gpio_icujrq[13:0) 


14 


Out 


GPIO pin Interrupts 


Debug. 


det)ug_data_out[1 6:3] 


14 


In 


Output debug data to be muxed on to the GPIO pins 


debug_cnti1[16:3] 


14 


In 


Control signal for each GPIO bound debug data line indicating 
whether or not the debug data should be selected by the 
mux 



13.10^ Configuration registers 

The configuration registers in the GPIO are programmed via the CPU interface, Refer to section 1 1.4.3 on 
page 70 for a description of the protocol and timing diagrams for reading and writing registers in the 
GPIO. Note that since addresses in SoPEC are byte aligned and the CPU only supports 32-bit register 
reads and writes, the lower 2 bits of the CPU address bus are not required to decode the address space for 
the GPIO. When reading a register that is less than 32 bits wide zeros should be returned on the upper 
unused bit(s) oigpio _pcu_data. Table 44 lists the configuration registers in the GPIO block 



Table 44. GPIO Register Definition 













CPU ID Control 


0x00 


CpulOCtri 


14 


0x0000 


Indicates whether each lO pin ts directly control- 
led by the CPU or not 

0 • Default Control 

1 - CPU Control 


0x04 


CpuiOUserModeMask 


14 


0x0000 


User Mode Access Mask to CPU GPIO control 
register. When 1 user access is enabled. One 
bit per gplo pin. Enables access to CputODirec* 
Uon, CputOOut, CpulOtn and CpuiOtnDegfitch 
in user mode if CpulOCtrt aWows CPU access. 


0x08 


CpulOSuperModeMask 


14 


OxGFFF 


Supervisor Mode Access Mask to CPU GPIO 
control register. When 1 supervisor access is 
enabled. One bit pergpio pin. Enables access to 
CpulOOirsction, CputOOut, CputOfn and Cpuf- 
OlnDeglitch in supervisor mode if CpulOCtri 
allows CPU access. 


OxOC 


CpulODirectron 


14 


OXOOOO 


Indicates the direction of each lO pin, when con- 
trolled by the CPU 

0 - Indicates Input Mode 

1 • Indicates Output Mode 
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Table 44. GPIO Register Definition 







m 




QXIU 


v;puiuaut 


30 


0x0000 
^0000 


Value used to drive output pin in CPU direct 
mode. 

bit$13:0 - Value to drive on output GPIO pins 
trits 15:14 - Reserved, (Read as zero always) 
bits 29:16 • Write enable mask for bits13:0. 0 
enables write. 1 masks the write. (Read as zero 
always) 


0x14 


CpulOln 


14 


Exter- 
nal pin 
value 


Value received on each input pin regardless of 
mode. Read Only register. 


0x18 


CpulOlnDeglitch 


14 


0x0000 


Deglftched version of CpulOln register. Note 
that after reset this register will reflect the exter- 
nal pin values 256 pcflr cycles after they have 
stabilized. Read Only register. 


Degmch contn 




0x20-024 


OeGmchCount[1:0] 


2x6 


OxFF 


De-glitch circuit sample count in DeGtitchClkSrc 
selected units for pins Qpio[13:0] 


0X28-2C 


OeGtitchClkSrc[1:0] 


2x2 


0x3 


Speafies the unit use of the GPIO deglitch cir- 
cuits: 

0 - 1 fis pulse 

1 - 1 00 ^is pulse 

2 - 10 ms pulse 
Z-pctk 


0x30 


DeGlitchSelect 


14 


0x000 


Specifies which deglitch count {DeGUtcbCount^ 
and unit select {DeGlitchClkSrd^ should be used 
to deglitch each GPIO pin 

0 - Specifies DeGntchCount[0} Sind DeGVitchCtk- 
Src[0] 

1 - Specifies DeGiitchCountfl] ea\6 DeGmchCtk- 
Src[1] 


Motor Control 


0x34 


MotorCtrlUs6rModeEnak>le 


1 


0x0 


User Mode Access enable to Motor control con- 
figuration registers. When 1 user access is ena- 
bled. 

Enables user access to MotorMasterCikPeriod, 
MotorMasterClkSrc, MotorDutySelect, Motor- 
PhaseSelect, MotorMasterClockBnabte and 
MotorMasterClkSetect registers 


0x38 to 0x3C 


MotorMasterClkPeriodll :0] 


2x16 


0x0000 


Specifies the motor controller master clock peri- 
ods in MotorMasterClkSrc selected units 


0x40 


MotorMasterClkSrc 


2 


0x0 


Specifies the unit use by the motor controller 
master dock generator: 

0 - 1 jis pulse 

1 - 100 \xs pulse 

2 - 10 ms pulse 

3 • pctfc 


0x44 to 0x50 


MotOfCtrlHigh[3:0] 


4x16 


0x0000 


Specifies the tow to high transition point in the 
clock period for each motor control pin. 


0x54 to 0x60 


MotorCtrlLow[3:0] 


4x16 


OxFFFF 


Specifies the high to low transition point in the 
clock period for each motor control pin. 
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Table 44. GPtO Register Deflnitlon 













0x64 to 0x70 


MotorMasterClkSelect[3:0] 


4x1 


0x0 


Specifies which motor master clock should be 
used as a pin generator source 

0 - Clock derived from MotorMasterClockPe' 
riod[0] 

1 -Ctock derived from MotorMasterCtockPB' 
riod[1] 


0x74 


MotorMasterClockEnable 


2 


0x0 


Enable the motor master dock counter. When 1 
count Is enabled 

Bit 0 - Enable motor master ck>ck 0 
Bit 1 - Enable motor master ck>ck 1 


LED control 




0x78 


LEDCtflUserModeEnable 


4 


0x0 


User Mode Access enable to LED control con- 
figuration registers. When 1 user access is ena- 
bled. 

One bit per LEDDutySetect select register. 


0x7C to 0x88 


LEDDutySelect(3:0] 


4x3 


0x0 


Specifies the duty cycle for each LED pin. See 
Rgure 42 for encoding details. The LEDDvtySe- 
lect[3:0] registers determine the duty cycle of 
the gp/of7;47 pins 


Frequency Ana 


lyser 


0x8C 


FreqAnaPinSelect 


4 


0x00 


Selects which GPIO input should be used for the 

frequency analyses. 


0x90 


FreqAnaPinFormSelect 


1 


0x0 


Selects if the frequency analyser should use the 
raw input or the deglilched form. 

0 - Deglitched form of input pin 

1 • Raw form of input pin 


0x94 


FreqAnaLastPeiiod 


16 


0x0000 


Frequency Analyser last period of selected input 

pin. 


0x98 


FreqAnaAverage 


16 


0x0000 


Requency Analyser average period of selected 
input pin. 


0x9C 


FreqAnaCountIno 


20 


0x0000 
0 


Frequency Analyser counter increment anrtount. 
For each clock cycle no edge is detected on the 
selected input pin the accumlator Is incremented 
by this amount. 


Miscetlaneous 




OxAO 


InterruptSrcSelect 


14 


0x000 


Interrupt source select.1 bit per GPIO pin. 
Determines whether the interrupt source Is 
direct form the input pin or the deglitched ver- 
sion 

1 • Input pin direct 

0 - Deglitched input pin 


0xA4 


DebugSelect 


6 


0x00 


Debug address select Indicates the address of 
the register to report on the gpio_cpu_data bus 
when It is not otherwise being used. 


OxA8^xAC 


MotorMasterCount 


2x16 


0x0000 


Motor master dock counter values. 
Bus 0 - Master dock count 0 
Bus 1 - Master dock count 1 
Read Only registers 
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13.10,2.1 Supervisor and user mode access 

The configuration registers block examines the CPU access type {cpu^acode signal) and detcnnines if the 
access is allowed to that paiticular register, based on configured user access registers. If an access is not 
allowed the GPIO will issue a bus error by asserting the gpiojcpujierr signal. 

Access to the CpuIODirection^ CpuIOOut, CpuIOIn and CpuIOInDeglitch is filtered by the CpuIOUser- 
ModeMask and CpuIOSuperModeMask registers. Each bit masks access to the corresponding bits in the 
CpuIO* registers for each mode, with CpuIOUserModeMask filtering user data mode access and CpuIO- 
SuperModeMask filtering supervisor data mode access. 

The addition of the CpuIOSuperModeMask register helps prevent potential conflicts between user and 
supervisor code read modify write operations. For example a conflict could exist if the user code is inter- 
rupted during a read modify write operation by a supervisor ISR which also modifies the Cpu/O* registers. 

An attempt to write to a disabled bit in user or supervisor mode will be ignored, and an attempt to read a 
disabled bit returns zero. If there are no user mode enabled bits then access is not allowed in user mode 
and a bus error will result. Similarly for supervisor mode. 

When writing to the CpuIOOut register, bits 29 to 16 are used to mask the write to the CpuIOOut f 1 3:0J. If 
the mask bit is zero the write is active to corresponding CpuIOOut pin. otherwise the write to that pin is 
ignored. 

The pseudocode for determining access to the CpuIODirection register is shown below. Similar code could 
be shown for the CpuIOOut, CpuIOIn and CpuIOInDeglitch registers. 

if (cpu_acode == SUPERVlSOR_DATA^MODE) then 
// supervisor node 

if (CpuIOSuperModeMask [13:0] 0 ) then 

// access is denied* and bus error 

gpio_cpu_berr = 1 
elsif (cpu_rwn == 1) then 

// read mode 

gpiO-.cpu_data{13 :0I = { CpuIOOut [ 13 : 01 & CpuI0SuperModeMask(13 : 0] ) 
else 

// write mode » filtered by mask 

masktl3:0J = - {cpu_dataout (29 : 16) ) & CpuIOSuperModeMask! 13 : 0) 

CpuIOOuttl3:03 = (( cpu_dataout [13 : 0) & inask[13:0] ) | 
( CpuIOOut r 13 :0J & -(mask(13 :0] ) ) > ) 
elsif (cpu.acode ==» USER_DATA„MODE) then 
// user datamode 

if (CpuIOUserModeMask [13 :0} ss Q ) then 

// access is denied, and bus error 

gpio_cpu_berr = 1 
elsif (cpu_jrwn == 1) then 

// read mode, filtered by mask 

gpio_cpu_data = ( CpuIOOut I 13 : 0] & CpuIOUserModeMask (13:0]) 
else 

// vnrite mode« filtered by mask 

roa3kC13:01 = - (cpu_dataout (29 : 16) ) & CpuIOUserModeMask ( 13 : 0] 

CpuIOOut (13:0) = ({ Cpu_dataouttl3:0] & mask(13:0] ) | 
( CpuI<X>ut(13:0) & -(maskdSrOJ)))) 

else 

// access is denied, bus error 
gpio_cpu_berr = 1 
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Table 45 details the access modes allowed for registers in the GPIO block. In supervisor mode all registers 
are accessible. In user mode forbidden accesses will result in a bus error {jgpiojopujberr asserted). 



Table 45. GPIO supervisor and user access modes 













0x00 


CpulOCtrl 


Supervisor data mode only 




0x04 


CpulOUserModeMask 


Supervisor data mode only 




0x08 


CpulOSuperModeMask 


Supervisor data mode only 




OxOC 


CpulODirection 


CpulOUserModeMask and CpulOSuperModeMask filtered 




0x10 


CpulOOul 


CpuIOUserModeMask and CpulOSuperModeMask filtered 




0x14 


CpulOln 


CpulOUserModeMask and CpulOSuperModeMask filtered 




0x18 


CpulOJnDeglttch 


CpuIOUserModeMask and CpulOSuperModeMask filtered 




0x20-024 


DeGntchCount(1K>] 


Supervisor data mode only 




0X28-2C • 


DeQlitchCtkSrctliO) 


Supervisor data mode only 




0x30 


DeGlttchSelect 


Supervisor data mode only 


1 


0x34 


MotorCtrtUserModeEnable 


Supervisor data mode only 


1 


0x38 to 0x3C 


MotorMasterClkPeriod[1:01 


MotorCtrlUserModeEnable enabled 


1 


0x40 


MotorMasterClkSrc 


MotorCtrlUserMode Enable enabled 




0x44 to 0x50 


MotorCtriHigh[3:01 


MotorCtrlUserModeEnable enabled 




0x54 to 0x60 


MotorCtrlLow[3:0] 


MotOfCtriUserModeEnabfe enabled 


1 


0x64 to 0x70 


MotorMasterClkSelect{3:0] 


MotorCtrlUserModeEnable enabled 


i 


0x74 


MotorMasterCfockEnable 


MotorCtrlUserModeEnable enabled 




0x78 


LEDCtrlUserModeEnabte 


Supervisor data mode only 




0x60 


LEDDutySerect[0] 


LEDCtrlUserModeEnable[0] enabled 




0x84 


LED0utySelect[1] 


LEDCtriUserModeEnable[1] enabled 




0x74 


LE0DutySelectt21 


LEDClrlUserModeEnable[2] enabled 




0x88 


LEDDutySelect(3] 


LEDCtriUserModeEnable[3] enabled 




Ox8C 


FreqAnaPinSelect 


Supervisor data mode only 


1 


0x90 


FreqAnaPlnFormSelect 


Supervisor data mode ortiy 




0x94 


FreqAnaLastPeriod 


Supervisor data mode only 




0x98 


Freq AnaAve rag e 


Supen/isor data mode only 




0x9C 


FreqAnaCountInc 


Supervisor data mode only 


1 


OxAO 


InternjptSrcSelect 


Supervisor data mode only 


1 


0xA4 


DebugSelect 


Supervisor data mode only 


1 


OxAS-OxAC 


MotorMasterCount 


Supervisor data mode only 
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13.10.3 GPIO partition 
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Figure 43. GPIO partition 



13.10.4 ip controf 

The lO control block connects the lO pin drivers to internal signalling based on configured setup registers 
and debug control signals. 

The motor, LED pins, ISI and LSS control logic: 
// motor and led pins 
for (i=0; i<14 ; i++) ( 

if (debug_cntrl (i) ==» 1) then 
- gpio_:e{i] « 1 . ■ • 

gpio_o(i] = debug_data_out 11] 
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cpu_io_inCi) = gpio_i(i) 
if (c:pu_io_ctrl(iJ 1) then 

gpio_e(i) = cpu_io_dir (il 

gpio^oti) = cpu_io_out(i] 

cpu_io_in[iJ « gpio_iti] 
else 

// default control 

if ( i < 4 ) then // motor control pins 

gpio_e(iJ « 1 

gpio_oCi) = motor_ctrl (1) 

cpu_io_in[il = gpio_iti] 
elsif ( i < 8 ) then // LED pins 

gpio_e(i) = 1 

gpio_o[il = led_ctrl[i) 

cpu_io_inCi) = gpio_i[iJ 
elsif (i < 10) then // LSS interface clock pins 

gpio_e(i) = 1 

gpio_o(i] = lss_gpio_clk t i-8] 

cpu_io_inti3 = gpio_i(i] 
elsif (i < 12) then // LSS interface data pins 

gpio_eti] = lss_gpio_e [i-lOJ 

gpio_o{il = lss_gpio_do[i~10 3 

lss_gpio_diCi-101 = gpio^iti) 
else // ISI interface* pins 

gpio_e(i3 = isi_gpio_e (i-12 ] 

gpio_oCi) ~ isi_gpio_dout[i-12) 

isi_gpio_dinti-12) = gpio_iEil 

} 

13.10.5 LED pulse generator 

The pulse generator logic consists of a 7-bit counter that is incremented on a l|is pulse from the timers 
block (/I'm ^ulsefOJ), The LED control signal is generated from comparing the count value with the con- 
figured duty cycle for the LED (Jedjiuty^sel). 

The logic is given by: 

for (i=0 i<4 ;i++) { // for each LED pin 
// period divided into 8 segments 
peri od_di v8 « cnt [6:4); 

if <period_div8 <= led_duty_sel [il) then 

led_ctrl[il = 1 
else 

led_ctrl[il « 0 
//in higher half invert the led control 
if (cnt 16) == 1) then 

led.ctrKi} « - led-Ctrl[i) 

> 

// update the counter every lus pulse 
if (tim_^ulse[0} == 1) then 
cnt ++ . 

1 3. 1 0.6 Motor control 

The motor controller consists of 2 counters, and 4 phase generator logic blocks, one per motor control pin. 
The counters decrement each time a timing pulse {cntjan) is received. The counters start the configured 
clock period value {motor _mas_clk ^period) and decrement to zero. If the counters are enabled (via 
motor_mas_clkjsnable), the counters will automatically restart at the configured clock period value, oth- 
erwise they will wait until the counters are re-enabled. 
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The timing pulse period is one of pcik, Ipts, 100|is, 1ms depending on the motor^masjclkjsel signal. The 
counters are used to derive the phase and duty cycle of the of each motor control pin. 

// decrexnenc logic 
if (cnt^en aa 1) then 

if ( <mas_cnt == 0) AND (motor_mas.clk_enable == 1}) then 

mas_cnt = inotor_roas_clX_period[15 :0I 
els if < (iM3_cnt 0> AND (motor jnas.clJ^enable 0) ) then 

mas^cnc » 0 
else 

mas^cnt — 
else // hold the value 
inAS_cnt = mas^cnt 



nx)tOf_mas_clk_src y 
tim_pulse(0)- 

tim.jKilse[2)- 
1- 



motor_mas.clk_period(01 -7^ 
motor.nias_dK.enable{0] 




motor_cirt 



niotor_mas_clK^i'iod[ 1 ] x ► 
motor_mas.clK^enable{ 1 ) p 



^ motor_mas_count 



Frgure 44. Motor control RTL diagram 

The phase generator block generates the motor control logic based on the selected clock generator 
{motor^mas_clkjse!) the motor control high transition point {motoric trl^high) and the motor control low 
transition point (motor_ctrl_low). There are 4 instances one per motor control pin. 

The logic is given by: 

// select the input counter to use 
if {motor_jnas„clk_sel == 1) then 

count = mas.cnttl] 
else 

count « inas_cnt(01 
// Generate the phase and duty cycle 

if ( (motor_ctrl == 1 ) AND (count == motor_ctrl_16w) ) then 
motor_ctrl « 0 

elsif ( (motor_ctrl == 0) AND (count == motor_ctrl_high) ) then 

motor^ctrl = 1 
else 

znotor.ctrl =. motor^ctrl // reznain the same 
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13.10.7 Input deglitch 

The input deglitch logic rejects input states of duration less than the configured number of time units 
(deglitch^cnt), input states of greater duration are reflected on the output cpujio_in^deglitch. The time 
units used (either pclk 1 ^> lOOjis, 1 ms) by the deglitch circuit is selected by the deglitchjclk^rc bus. 

There are 2 possible sets of deglitch jcnt and degUtch_clkjsrc that can be used to deglitch the input pins. 
The values used are selected by the deglitch^el signal. 

Each input pin can be used to generate an interrupt. The interrupt can be generated from the raw input sig- 
nal or a de^itched version of the input The interrupt source is selected by the interrupt_^rc_select signal. 

The counter logic is given by 

if ( cpu_io_in 1= cpu_io_in_delay) then 

cnt = deglitclV-cnc 

output^en s 0 
elsif (cnt == 0 ) then 

cnt = cnt 

output_en s 1 
elsif (cnt_en == 1) then 

cnc 

output_en = 0 



cpu_io_fn ' 



tbn_pu!se[OI- 
tJm_putsd(1|- 
tim_pulset2]- 



cptLfo.in_delay 



cnt,en 



deglitch_clK.sellO] - 
doglltch.dk.selll)- 
degntch_crrt(01 - 
deglitch.cntlil- 

degiitctusol - 




Counter 
Logic 



^ en 



-7^ 



Compare 
_ 



oiiiput_en 



cpu_k)Jn_deQlltch 



cpu_ioJn . 
int0rnipt.src.sel — 

Figure 45. Input de-glitch RTL diagram 



^ ► gpio.lcujrq 



13.10.8 Frequency Analyser 

The frequency analyzer block monitors a selected input pin (selected by FreqAnaPinSelect and FreqAnaP- 
inFormSet) and detects positive edges. Between successive positive edges detected on the input pin it 
increments a counter by a programmed amount (FreqAnaCountInc) on each clock cycle. When a positive 
edge is detected the FreqAnaLastPeriod register is updated with the top 16 bits of the counter and the 
counter is reset. The frequency analyser also maintains a running average of the FreqAnaLastPeriod regis- 
ter. Each time a positve edge is detected on the input pin the FreqAnaAverage register is updated with the 
new calculated FreqArtaLastPeriod, The average is calculated as 7/8 the current value plus 1/8 of the new 
value. Both the FreqAnaLastPeriod and FreqAnaAverage registers can be written to by the CPU. 



The pseudocode is given by 

if {(pin == 1) AND pin_delay =5=0 ) ) then 
£req_ana.lastperiod = count [31 : 16]- 
freq_ana_average = f re<Lana_average 



// positive edge detected 

■ £re(i.ana_average/8 f req_ana_lastperiod/8 
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count = 0 
else 

coimt = count * freq„ana.count.inc 
// iinplement the configuration register write 
if (vnr_last_en 1) then 

f requana^lastperiod = vnr_data 
elsif (vnr_average_en 1 ) then 

f re(L.ana_average ^ wr.data 



cpu.lojn.deglitchf 1 3:0] 
cpu_k).ln(l'3:01 



froq_ana_pln_fprm_sel • 
freq.anaj9in_8el(3:0] - 



pin 



Pin detav ^ 



1% 

wr„data(15:01 y 

wr_last_en — 

wr_averago_en 

20 

freq_an8_counUnc — 7^ 



Analyser Logic 









16 




► 








> 





^ ^ count 



(req.ana_la5t_perfod[t5:0] 



freq_ana_averaoe[l S :0] 



32 



D 



Figure 46. Frequency analyser RTL diagram 
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14 Interrupt Controller Unit (ICU) 

The interrupt controller accepts up to N input interrupt sources, determines their priority, arbitrates based 
on the highest priority and generates an interrupt request to the CPU. The ICU complies with the interrupt 
acknowledge protocol of the CPU. Once the CPU accepts an intenupt (i.e. processing of its service routine 
begins) the interrupt controller will assert the next arbitrated interrupt if one is pending. 

Each interrupt source has a fixed vector number N, and an associated configuration register, IntReg[N]. 
The format of the IntReg[N] register is shown in Table 46 below. 



Table 46. rntReg[N] register format 









Priority 


7:0 


Interrupt priority 


Type 


9:B 


Determines the triggering conditiorte for the interrupt 

00 - Positive edge 

1 0 - Negative edge 

01 - Positive level 
11- Negative level 


Mask 


10 


Mask bit. 

1 - interrupts from this source are enabled, 
0 - Interrupts from this source are disabled. 

Note that there may be additional masks in operation at the source of the 
interrupt 


Reserved 


31:11 


Reserved. Write as 0. 



Once an interrupt is received the interrupt controller determines the priority and maps the programmed pri- 
ority to the available CPU priority levels, and then issues an interrupt to the CPU. The mapping of pro- 
granuned priority to native interrupt levels will be fixed, and is dependent on CPU choice. 

For example for the LEON CPU there are 1 5 levels available which would allow 16 sub-priorities per level 
(as each level is in itself a priority). In this case priorities 255-240 map to level 15, 240-224 to level 14 and 
so on, with priorities 15-0 corresponding to level 0. Level 0 is no interrupt. Level 15 is the highest interrupt 
level. 



14.1 Interrupt preemption 

There are two types of pre-emption possible: standard LEON pre-emption and SoPEC pending pre-emp- 
tion. With standard LEON pre-emption an interrupt can only be pre-empted by an interrupt with a higher 
priority level. If an interrupt with the same priority level (1 to 15) as the intenupt being serviced becomes 
pending then it is not acknowledged imtil the current service routine has completed. The SoPEC pending 
pre-emption is an extension of the standard LEON scheme which is made possible by the programonable 
priority levels in the IntRegfN] register. 

Interrupts with a higher sub-priority will pre-empt interrupts with a lower sub-priority but the same prior- 
ity level mapping, if the interrupt has not been acknowledged by the CPU i.e. it is still pending. If an inter- 
rupt with a higher sub-priority arrives while an interrupt with a lower sub-priority at the same level is 
being serviced then it will not be serviced until the lower sub-priority service routine has completed. 

Thus when pre-emption is required, interrupts should be programmed to different levels as interrupt prior- 
ities of the same level have no guaranteed servicing order. 

The interrupt is directly acknowledged by the CPU and the ICU automatically clears the pending bit of 
. acknowledged interrupts. 
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All intemq)t controller registers are only accessible in supervisor data mode. If the user code wishes to 
mask an interrupt it must request this fix>m the supervisor and the supervisor software will resolve user 
access levels. 

1 4.2 Interrupt sources 

The mapping of interrupt sources to interrxipt vectors (and therefore lntReg[N] registers) is shown in 
Table 47 below. Please refer to the appropriate section of this specification for more details of the interrupt 
sources. 



Table 47. Interrupt sources vector tabfe 









0 


Timers 


WatchDog Timer Update request 


1 


Timers 


Generic Timer 1 interrupt 


2 


Timers 


Generic Timer 2 interrupt 


3 


Timers 


Generic Timer 3 Interrupt 


4-17 


GPfO 


GPIO general interrupt, source pin 0 -13 


18 


MMU 


MMU Security violation 


19 


SCB 


USB intermpt 


20 


SCB 


ISI interrupt 


21 


SCB 


DMA interrupt 


22 


LSS 


LSS interrupt. LSS Interface 0 interrupt request 


23 


LSS 


LSS intemipt LSS Interface 1 interrupt request 


24 


PCU 


PEP Sub-system Interrupt- CDU finished band 


25 


PCU 


PEP Sub-system Inteniipt- CDU error 


26 


PCU 


PEP Sub-system Interrupt- LBD finished band 


27 


PCU 


PEP Sub-system Interrupt- TE finished band 


28 


PCU 


PEP Sub-system Interrupt- PCU finished band 


29 


PCU 


PEP Sub-system Interrupt- PCU invalid address interrupt 


30 


PCU 


PEP Sub-system Interrupt- PHI Buffer undernjn 


31 


PCU 


PEP Sub-system Interrupt- PHI Page finished 


32 


PCU 


PEP Sub-system Intemjpt- PHI Print ready 


33 


PHI 


PEP Sub-system Intemjpt- PHI Une Sync Interrupt 
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14.3 Implementation 

14.3.1 DefinHionsofl/O 

Table 48. Interrupt Controller Unit I/O definition 









Clocfcs and Resets 


pcfk 


1 


In 


System Clock 


prst_n 


1 


In 


System reset, synchronous active low 


CPU Interface 


cpu.adr(7:2) 


6 


In 


CPU address bus. Only 6 bits are required to decode the 
address space for the ICU block 


cpu_dataout(31:OJ 


32 


In 


Shared write data bus from the CPU 


lcu_cpu_data[31.-0] 


32 


Out 


Read data bus to the CPU 


cpu_rwn 


1 


In 


Common read/not-write signal from the CPU 


cpu^reu_8el 


1 


In 


Block select from the CPU. When cpvjcu_se! \^ high both 
qpL/_adrand cpu_dataout are valid 


icu_cpu_rdy 


1 


Out 


Ready signal to the CPU. When icu^cpu^fdyls high it indi- 
cates the last cycle of the access. For a write cycle this 
means cpo. c/ataouf has been registered by the ICU bk^ck 
and for a read cycle this means the data on icujcpu^data is 
valid. 


icu_cpu_IJevel[3:0) 


4 


Out 


Indicates the priority level of the current active interrupt. 


cpu^iack 


1 


Out 


Interrupt request acknowledge from the LEON core. 


cpujcujlevel[3:0] 


4 


In 


Interrupt acknowledged level from the LEON core 


icu_cpu_berr 


1 


Out 


Bus error signal to the CPU indk^atlng an invalid access. 


cpu_acodo(1 :0] 


2 


In 


CPU Access Code signals* Thess decode as foDows: 

00 - User program access 

01 • User data access 

10 - Supervisor program access 
11- Supervisor data access 


icu_cpu_debufl_valld 




Out 


Debug Data valid on icu_cpu^data bus. Active high 


Interrupts 


timjcu^wdjrq 




In 


Watchdog timer interrupt signal from the Timera block 


timjcujrq(2:0j 




In 


Generic timer intenrupt signals from the Timers block 


gplo_icujrq{13:0] 


14 


In 


GPIO pin interrupts 


mmujcujrq 




In 


Memory Management Unit interrupt 


usb_icujrq 




In 


USB interrupt from the SCB 


isijcujrq 




In 


ISI interrupt from the SCB 


dmajcujrq 




In 


DMA interrupt from the SCB 


lss.icujrq(1:01 




tn 


LSS Interface interrupt request 


cdu_fintshedband 




In 


Finished t^and interrupt request from the CDU 


cdujcujpegerror 




In 


JPEG error Interrupt from the CDU 


Ibd^finishedband 




In 


Rnished band interrupt request from the LBD 


to_finishe<ft)arKl 




In 


Rnished band Interrupt request from the TE 


pcu.tinishedband 




tn 


Finished band interrupt request from the PCU 


pcujcu.addressjnvaltd 




In 


Invalkj address interrupt request from the PCU 
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Table 46. Interrupt Controller Unit I/O definition 











phi_lco_underrun 


1 


In 


Buffer underrun intenupl request from the PHI 


phijcuj>age_finish 


1 


In 


Page finished Interrupt request from the PHI 




1 


In 


Print ready Interrupt request from the PHI 


phLicu Jinesyncjnt 


1 


In 


Une sync Interrupt request from the PHI 



14.3.2 Configuration registers 

The configuration registers in the ICU are programmed via the CPU interface. Refer to section 1 1.4 on 
page 69 for a description of the protocol and timing diagrams for reading and writing registers in the ICU. 
Note that since addresses in SoPEC are byte aligned and the CPU only supports 32-bit register reads and 
writes, the lower 2 bits of the CPU address bus are not required to decode the address space for the ICU. 
When reading a register that is less than 32 bits wide zeros should be returned on the upper unused bit(s) 
of icu^cu_data. Table 49 lists the configuration registers in the ICU block. 

The ICU block will only allow supervisor data mode accesses (i.e. cpujacode[l :0] - 
SUPERVISOR_PATA), All other accesses will result in icu_cpu_berr being asserted 



Table 49. ICU Register Map 















1 


0x00 - 0x84 


lntRe8[33:0] 


34x11 


0x000 


Interrupt vector configuration register 


1 


0x88-0x8C 


tntClear[1 :0] 


2x32 


0x0000 
_0000 


Interrupt pending clear register. If written with a on© 
it clears corresponding interrupt 
IntCleartO} - Interrupts sources 31 to 0 
IntClearJI] - Interrupts source 33 to 32 


1 


0x90-0x94 


intPendlng[1:0l 


2x32 


0x0000 
_0000 


Interrupt pending register. (Read Only) 
lnlPendlng[0] - Interrupts sources 31 to 0 
lntPending(1] - Interrupts source 33 to 32 


1 


0x98 


IntSource 


6 


0x00 


Indicates the interrupt source of the current winning 
active interrupt. (Read Only) 




0x9C 


Det>ugSelect 


6 


0x00 


Debug address select Indicates the address of the 
register to report on the icu^cpu^data bus when it 
is not otherwise being used. 
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14.3*3 ICU partition 

timjcu_wd.lf( 
tlni_teujrql2:" 
flplo_lcuJfqI13: . 
mnnj^lcujrq - 
usb Jcujrq - 
teLlcuJrq - 
dma.lcu.lrq - 
tS3jcuJrq[1:01 - 
cdu_finlsh6<ft)ajxf - 
cdujcu Jpeyerror - 
Ibd.finlshodband - 
te^finishodband - 
pcu.firtishedband - 
pcu_icu_address_lnvalid - 
phLkxi j>age_ftnish - 
phlJcu_prinU''dy - 
phijcu^underrun * 
pW_ico Jlnesync^lnt - 



lfit_syc ^ 



x34 



Interrupt 
detect 



T — 

/ 34x12 



|1 

s 



Int priority* 



cpuJnLctear 



t T 



int src 



Configuration 
registers 



i 



/'32 



i 



/2 



CPU 



Interojpt 
arbiter 



i 



Interrupt 
controller 



Figure 47. ICU partition 



14.3.4 Interrupt detect 

The ICU contains multiple instances of the interrupt detect block, one per interrupt source. The interrupt 
detect block examines the interrupt source signal, and determines whether it should generate request pend- 
ing {impend) based on the configured interrupt type and the intermpt source conditions. If the interrupt is 
not masked the interrupt will be reflected to the interrupt arbiter via the intjactive signal. Once an interrupt 
is pending it remains pending until the interrupt is accepted by the CPU or it is level sensitive and gets 
removed Masking a pending interrupt has the effect of removing the interrupt from arbitration but the 
interrupt will still remain pending. 

When the CPU accepts the interrupt (using the normal ISR mechanism), the interrupt controller automati- 
cally generates an interrupt clear for that interrupt source (cpujnt_clear). Alternatively if the interrupt is 
masked, the CPU can deteimine pending interrupts by polling the IntPending registers. Any active pending 
interrupts can be cleared by the CPU without using an ISR via the IntClear registers. 

The logic is shown below: 

znask = int_conf ig [10] 

type = int^c onf ig [ 9 : 8 ] 

int_priority = int_conf ig(7 :0j 

int_pend = last_int_pend // the last pending interrupt 

// update the pending FF 

if ((int_cleor == 1 )OR (cpu_int_cleare=i) ) then 

int_pend =0 
// test for interrupt condition 

if {(type NEG_LEVEL ) AND (int_ST-c == 0) then 
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lnc_pend = 1 
elsif ((type POS.LEVEL) 

int^end = 1 
elsif ((type NEG.EDGE ) 

int^end » 1 
elsi£ ((type a- POS.EDGE } 

int_pend a 1 
else 

int.^nd = last_int_src 
// mask the pending bit 
if (mask == 1) then 

int.actlve a int_pend 
else 

int_active a 0 
// assign the registers 
last_int_8rc « int^src 
la8t.int_j>end => i attend 

1 4.3.5 Interrupt arbiter 

The interrupt arbiter logic arbitrates a winning interrupt request from multiple pending requests based on 
configured priority. It generates the interrupt to the CPU by setting icujcpujlevel to a non-zero value. The 
priority of the interrupt is reflected by the value assigned to icujcpujlevel, the higher the value the higher 
the priority, 15 being the highest. The current winning interrupt and is reported to the CPU via the IntSrc 
register generated in the interrupt arbiter block. 

// arbitrate based on priority 
if (arb^enable == 1 ) then 

// arbitrate with the current winner 

win_int_priori ty = 0 

int_src = 0 

int^request » 0 ^ 

for (i=0; i<34;i*+) { 

if ( int_active{i) 1) then { 

if ( int_priority (il > win_inc_priority ) then 
win_int_priority = int_priority (ij 
int_src = i 

int__re<iuest » 1 

> 

> 

} - 

// assign the CPU interrupt level 
int^ilevel - int_priority( int_srcn7:41 
) 



AND (int.src == 1) 

AND (int_src == 1) AND (last_int_src 0)) 
AND (int.src == 0) AND (last_int_src in 

// stay the same as before 



14.3.6 Interrupt controller 



The interrupt controller is responsible for generating the interrupt to the CPU, accepting the interrupt 
acknowledge from the CPU and clearing the interrupt source pending bit 
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The exact procedure is CPU dependent, but examples are given for the LEON processor. See section 1 1.9 
on page 98 for a complete description of the interrupt handling procedure. 



Reset 



c 



3 



aft.enable ^ i 



fnt rgquest=l 



IntPend 



aft>_i 



.cpu_ilevai =lnt_Uevel 
.enable s i 



Machrne remains In same state by defiauft 
AS outputs are zero unless othenArise stated 

State Description: 

Reset : Komnal reset state 

IntPend: tntemjpt pending, waiting for CPU acknowledge 

IntClear Interrupt dear, clear the pending bit for the 
current interrupt vector 



IntClear 



cpu_int_deaitlnt,src)a1 
arb_enablo « 0 



Figure 48. Interrupt controller state diagram 

After reset the interrupt controller remains in the Reset state until the interrupt arbiter indicates that there is 
an active interrupt pending {int^request equal 1). The state machine goes to the IntPend state and signals to 
the CPU that an interrupt is pending. The machine will remain in the IntPend sute until the interrupt is 
acknowledged by the CPU or the pending interrupt condition is removed. 

When the interrupt is acknowledged the state machine goes to the IntClear state to clear the pending bit of 
the interrupt source. 

On completion the state machine returns to the Reset state and again waits for the next pending interrupt. 
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15 Timers Block (TIM) 



The Timers block contains general purpose timers, a watchdog timer and timing pulse generator for use in 
other sections of SoPEC. 



The watchdog timer is a 32 bit counter value which counts down each time a timing pulse is received. The 
period of the timing ptilse is selected by the WatchDogUnitSei register. The value at any time can be read 
from the WatchDogTimer register and the counter can be reset by writing a non-zero value to the register. 
Should the counter reach 1 , a system wide reset will be triggered as if the reset came from a hardware pin. 

The watchdog timer can be polled by the CPU and reset each time it gets close to 1 » or alternatively a 
threshold {WatchDoglntThres) can be set to trigger an interrupt for the watchdog timer to be serviced by 
the CPU. This interrupt can be effectively masked by setting the threshold to zero. The watchdog timer can 
be disabled, without causing a reset, by writing zero to the WatchDogTimer register. 



The timing block contains a timing pulse generator clocked by the system clock, used to generate timing 
pulses of i^s, 100p.s and 10ms. Each pulse is of one system clock duration and is active high, with the 
pulse period accurate to the system clock frequency. 

The timing pulse generator also contains a 64-bit free running counters that can be read or reset by access- 
ing the FreeRunCount register. 



SoPEC contains 3 programmable generic timing counters, for use by the CPU to time the system. The tim- 
ers are progranmied to a particular value and count down each time a timing pulse is received. If a particu- 
lar timer decrements to 0, then an interrupt is generated. The counter can be programmed to automatically 
restart the count, or wait until re-programmed by the CPU. At any time the status of the counter can be 
read from CenCntValue^ or can be reset by writing to CenCntValue register. The auto-restart is activated 
by setting the GenCntAuto register, when activated the counter restarts at GenCntStartValue. A counter 
can be stopped or started at any time, without affecting the contents of the GenCntValue register, by writ- 
ing a I or 0 to the relevant GenCntEnable register. 



15.1 



Watchdog timer 



15.2 



Timing pulse generator 



15-3 



Generic timers 
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15.4 Implementation 

15.4.1 Definitions of I/O 



Table SO. Timers block I/O deflnitlon 





MM\ 




Clocks and Resets 


pdk 


1 


In 


System Clock 


prst_n 


1 


In 


System reset, synchronous active low 


tim^ulse[2:0] 


3 


Out 


Timers block generated timing pulses, each one pclk wide 

0 * 1(xs pulse 

1 - 100 |is pulse 

2 • 10ms pulse 


CPU Interface 


cpu_adft62) 


5 


In 


CPU address bus. Only 5 bits are required to decode the 
address space for the ICU block 


cpu_dataoutI31:0] 


32 


In 


Shared write data bus from the CPU 


tim_cpu_data[31 :0] 


32 


Out 


Read data bus to the CPU 


cpu_fwn 


1 


In 


Common read/not-write signal from the CPU 


cpu_lim_sel 


1 


In 


Block select from the CPU. When cpu_tim_sel is high both 
cpccadrand cpujdataout are valid 


tim_cpu_fdy 


1 


Out 


Ready signal to the CPU. When tfm_qp«i_rc// is high it indi- 
cates the last cycle of the access. For a write cycle this 
means cpu_dataout has been registered by the TIM btock 
and for a read cycle this means the data on tim_qpu_data is 
valid. 


tim_cpu_berr 


1 


Out 


Bus error signal to the CPU indicating an invalid access. 


cpu_acode[l :0) 


2 


In 


CPU Access Code signals. These decode as foltows: 

00 - User program access 

01 - User data access 

10 - Supervisor program access 

11 - Supervisor data acxess 


tim cpu debug^valkj 


1 


Out 


Debug Data valkl on tim_cf>u^cfata bus. Active high 


Miscellaneous 


tim_icu_wdjfq 


1 


Out 


Watchdog timer interrupt signal to the ICU block 


tim_icujrq[2:0l 


3 


Out 


Generic timer interrupt signals to the ICU block 


tim_cpr_reset_n 


1 


Out 


Watch dog timer system reset. 




Doc: SoPEC_harclware_d©sign 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 160 



SoPEC : Hardware Design 



15.4.2 Timers sub-block partition 



CPU 





cou_tim„sel 




► 




cpu.dataout 






tVn_cpu_r<ly 




4- 


tim_cpu_data 




4- 


cpu_rwn 






cou_acode 


^ 




tim_cpu_berr 




4- 


tim CDU debuQ valid 



S 
c 

s 

a 



^ulse_tiiner_Gtatus 






Jree_mrLCnt 












fre©_run_wen 






free run adr 









wtioQ unit 5el 


— 4- 






wdog_wen 




— ► 




wdOQ tim data 




— ► 




wdoa tfm cnt 


^3^ 





Timing pulse 
generator 



pen tim gn 



pan unit set 



_ flen_,Hm_data 



gen_tim_cnt 



4- 



3 



gen_tim^cnt_st_value ^ 



-> tinuMilse{2:0] 



Watchdog 
timer 



tlm.icu_wd_Irq 



tim_cpr_fosei_n 



Generic 
timers 



- tinfulcu_irqI2:01 



Figure 49. Timers sub-block partition diagram 



1 5.4.3 Watchdog timer 

The watchdog timer counts down from pre-programmed value, and generates a system wide reset when 
equal to one. When the counter passes a pre-programmed threshold (wdog_tim_thres) value an interrupt is 
generated {tim_icu_wd_irq) requesting the CPU to update the counter Setting the counter to zero disables 
the watchdog reset. In supervisor mode the watchdog counter can be written to or read from at any time, in 
user mode access is denied. Any accesses in user mode will generate a bus error. 

wdog^unlusel- 

tim_pulse[0] 
tjm_putse[1] 
0m.j)ul8e[2] 
1 



wdog^wen 
wdog_tim_data 




timjicu_wd_lrq 
tJm_<^r_resot_n 
^ wdog.tim_cnt 



Figure. 50. Watchdog timer RTL diagram 



The counter logic is given by 
if (wdog_wen 1) then 

w<io9_tiin_cnt « wdog_tin\_<iata 
elsi£ ( wdog_tiaucnc == 0) then 

wdog_tiiiucnt « wdog_tin\_cnt 
elsif ( cnt_en == 1 ) then 



// load new data 



/•/ count disabled 
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else 

wdog_tiin_cnt = wdog_tin\_cnt 

The timer decode logic is 

if (( vrdog^tim^cnt == wdog_tiin_chre8) AND (wdog^tinucnt 1= 0 )) then 

tinuicujwd^irq o 1 
else 

tinuicu_wdu_irq ■ 0 
// reset generator logic 
if (wdog_tinucnt == 1) then 

tliiucpr_reset_n = 0 
else 

tiBucpr_reset_n = 1 



1 5.4.4 Generic timers 

The generic timers block consists of 3 identical counters. A timer is set to a pre-configured value (GenCnt- 
StartValue) and counts down once per selected timing pulse (jgen_unit_sef)- The timer can be enabled or 
disabled at any time (genjtim_fin\ when disabled the counter is stopped but not cleared. The timer can be 
set to automatically restart (gen_tim_auto) after it hits zero. In supervisor mode a timer can be written to or 
read from at any time, in user mode access is determined by the GenCntUserModeEnable register settings. 



gen_unit_5el 



tim_pulse(0] 
tlm.4}ulse[l] 
tim_pulS6[2] 
1 

9enL.tim_cnt_st_value 
gen_w©n 
gen_t)fn_data 
gen_tim_en 




^ tim^icujrq 



^2 — ^ Qen_tim_cnt 



Figure 51. Generic timer RTL diagram 



The counter logic is given by 

if (gen_wen «= I) then 

'gen_tiin_cnt = gen_tira^data 
elsif (( cnt_en =« 1 )AND (gen_tiin_en == 1 ) ) then 

if ( gen_tim_cnt == 0) then // counter may need re-starting 
if (gen_tiiruauto 1) then 

gen_tijn_cnt = gen_tinucnt_st_value 
else 

gen_tiin_cnt = gen_tim_cnt 

else 

gen_tiB\_cnt — 

else 

gen_tinL.cnt = gen_tiin_cnt 

The decode logic is 

if (gen_tim_cnt == 1) then 

tinK«icu_irq = 1 
else 

tir\_icu_irq = 0 
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1 5.4.5 Timing pulse generator 



The timing pulse generator contains a general free ninning 64-611 timer and 3 timing pulse generators pro- 
ducing timing pulses of one cycle duration with a period of \\is, lOO^is and 1ms. In supervisor mode the 
free running timer register can be written to or read from at any time, in user mode access is denied. The 
status, of each of the l|is» lOO^s and 1ms timer can be read by accessing the JlmerPulseStatus registers. 
Any accesses in user mode will result in a bus error. The status of each of the l^s, lOO^Jis and Ims timer 
can be read by accessing the JimerPulseStatus register in supervisor mode. 



Free Run Timer 



free.run.data 
fre6_run_adr 




free_run_cnt 



putse.lOOus < 



1 US Timer 



Decrement 
Logic lus 



Decrement 
Logic lOOus 



100US Timpr 



Decrement 
Logic 10ms 



1 0ms Time ' 

— 



Compare 



pu1so„1u3 ^ tim_pulse(0] 



Compare 



puise.ioous 

^ tlm^ulse(1l 



Compare 



' tim^LiIse|2| 



03— J 



pulse.b'mar.status 



tJm_pul3e[2:0)- 
Figure 52. Pulse generator RTL diagram 



15.4.5.1 Free Run Timer 

The increment logic block increments the timer count on each clock cycle. The counter wraps around to 
zero and continues incrementing if overflow occurs. When the timing register (FreeRunCount) is written 
to, the configuration registers block will set the /ree_run_wen high for a clock cycle and the value on 
free_run_data will become the new count value, for the 32 bits selected by the free_run_adr signal. If 
Jree_run_adr is 1 the higher 32 bits of the counter will be written to, otherwise the lower 32 bits are writ- 
ten to. It is the responsibility of softwaire to handle these writes in a sensible manner. 

The increment logic is given by 
if < f ree_run_wen 1) then 

if ( f ree_irun_adr == 1) then .... 
free_run_cnt[63 :32] = f ree_run_data 

else 

f ree_run_cnt[31 :01 ° f ree_run_data 

else 
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free_run_cnt 

15.4.5.2 Putse Timers 

The pulse timer logic generates timing pulses of 1 clock cycle length and period of l^s, 100|xs and 1ms. 
The logic for the 1 |is timer is given by: 

// lus generator 

if (pulse_lus_cnt 0 ) then 

pulse_lus_cnt = 159 

pulse_lus = 1 
else 

pulse_lus_cnt -- 
pulse^lus = 0 

The logic for I00\xs timer is given by: 

// lOOus generator 

if { (pulse_100us_cnt == 0 ) AND (pulse^lus «« 1)» then 

pulse_100us_cnt = 99 

pul8e_100us = 1 

elsif (puXse_lus == 1) then 

pulse_100us_cnt — 

pulse.lOOus = 0 

else 

pulse_100us_cnt — 
pulse_100u3 = 0 

The logic for the 1 0ms timer is given by: 

// lOms generator 

if ( (pulse_lOms_cnt == 0 ) AND <pulse_100us == 1)) then 

pulse_lOins_cnt = 99 

pul9e_10ros = 1 

elsif (pulse.lOOus 1> then 

pulae_10ms_cnt 

pulse_10ms = 0 

else 

pulse_10ms_cnt -- 
pulse_10ms = 0 

15.4.6 Configuration registers 

The configuration registers in the TIM are programmed via the CPU interface. Refer to section 11. 4.3 on 
page 70. for a description of the protocol and timing diagrams for reading and writing registers in the TIM. 
Note that since addresses in SoPEC are byte aligned and the CPU only supports 32-bit register reads and 
writes, the lower 2 bits of the CPU address bus are not required to decode the address space for the TIM. 
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S5 



When reading a register that is less than 32 bits wide zeros should be returned on the upper unused bit(s) 
of tim^cujdata. Table 51 lists the configuration registers in the TIM block . 



Table 51. Timers Register Map 













0x00 


WatchDogUnitSel 


3 


0x0 


Specifies the units used for the watchdog 
timer: 

0 - 1 pulse 

1 • 100 pufse 
2-10 ms pulse 


0x04 


W&tchOogTlmer 


32 


OxFFFF 
„FFFF 


Specifies the number of units to count before 

watchrion tiniAr trinftArs 

wau*4iwv^ Ml lid illUUVl9« 


0x08 


WatchOoglntThres 


32 


0x0000 
^0000 


Specifies the threshold value below which the 
watchdog timer Issues an Inten^pt 


OxOC-OxlO 


FreeRunCountll.-O] 


2x32 


OxO(K)0 
_0000 


Direct access to the free running counter reg- 
ister. 

Bus 0 - Access to bits 31 -0 
Bus 1 " Access to bits 63-32 


0x14 to 0x1 C 


GenCntS!artVatue[2:0] 


3x32 


0x0000 
_0000 


Generic timer counter start value, number of 
units to count before event 


0x20 to 0x28 


GenCntValue(2:0] 


3x32 


0x0000 
_0000 


Direct access to generic timer counter regis- 
ters 


QXZC lO 0X34 


oencntunitsei[2:0] 


3x2 


0x0 


Generic counter unit select. Selects the timing 
units used with corresponding counter: 

0 - 1 ^s pulse 

1 - 1 00 }i3 pulse 

2 - 1 0 ms pulse 
3 -PC* 


0x38 to 0x40 


QenCntAuto[2:0] ' 


3x1 


0x0 


Generic counter auto re-start select When 
high timer automatically restarts, othen/vise 
timer stops. 


0x44 to 0x4C 


GenCntEnable[2K)] 


3x1 


0x0 


Generic counter enable. 

0 - Counter disabled 

1 - Counter enabled 


0x50 


GenCntUserMode Enable 


3 


0x0 


User Mode Access enable to generic timer 
configuration register. When 1 user access is 
enabled. 

Bit 0 - Generic timer 0 
Bit 1 - Generic tirrier 1 
Bit 2 - Generic timer 2 


0x54 


DebugSelect 


6 


0x00 


Detxig address select. Indicates the address 
of the register to report on the tim_cpu_data 
bus when it is not otherwise being used. 


Read Only Registers 


oxsa 


PulseTImerStatus 


24 


0x00 


Current pulse timer values, and pulses 

6:0 • 1 us timer count 

7 - 1 us pulse 

14:6 - lOOus timer count 

15 -lOOus pulse 

22:16- 10ms timer count 

23 - 10 ms pulse 
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f 5.4. 6. i Supervisor arid user mode access 

The configuration registers block examines the CPU access type {cpu_jacode signal) and determines if the 
access is allowed to that particular register, based on configured user access registers. If an access is not 
allowed the block will issue a bus error by asserting the tim^cpujberr signal. 

The timers block is fully accessible in supervisor data mode, all registers can written to and read from. In 
user mode access is denied to all registers in the block except for the generic timer configuration registers 
that are granted user data access. User data access for a generic timer is granted by setting corresponding 
bit in the GenCntUserModeEnable register. This can only be changed in supervisor data mode. If a partic- 
ular timer is granted user data access then all registers for configuring that timer will be accessible. For 
example if timer 0 is granted user data access the GenCntStartValue[0]^ GenCntUnitSelfOJ, GenCn- 
tAutofOJ^ GenCntEnable[OJ and GenCnt Value [OJ registers can all be written to and read from without any 
restriction. 

Attempts to access a user data mode disabled timer configuration register will result in a bus error. 

Table 52 details the access modes allowed for registers in the TIM block. In supervisor data mode all reg- 
isters are accessable. All forbidden accesses will result in a bus error {timjzpujberr asserted). 



Table 52. TIM supervisor and user access modes 







H 






^^^^ 










0x00 


WatchDogUnitSel 


Supervisor data mode only 


0x04 


WatchOogTimer 


Supervisor data mode only 


0x08 


Watch DoglnfThres 


Supervisor data mode onty 


OxOC-OxlO 


Free RunCount 


Supervisor data mode only 


0x14 


GenCntStartVatue[0] 


GenCntUserModeEnabte[0] 


0x18 


GenCntStartValue[1] 


GenCntUserModeEnabIe(1 ] 


0x1 C 


QenCntSiartValue[2] 


QenCntUserModeEnable[2] 


0x20 


GenCntVa!ue[0] 


GenCntUserModeEnable[0] 


0x24 


GenCntValue[1] 


GenCntUserModeEnablell ] 


0x28 


GenCntValue[2] 


GenCntUserModeEnable[21 


0x2C 


GenCntUnitSel[01 


GenCntUserMode EnabtetOJ 


0x30 


GenCntUnitSeltl] 


GenCntUserModeEnabie[1 ] 


0x34 


GenCntUnttSel(2] 


GenCntUserMode Enable[2] 


0x38 


GenCntAutoIO] 


GenCntUserMode Enabte[0] 


0x3C 


GenCntAuto[1] 


GenCntUserModeEnable(1 ] 


0x40 


GenCntALJto[2] 


GenCntUserModeEnable[2] 


0x44 


GenCntEnable[OJ 


GenCntUserMode Enable[0} 


0x48 


GenCntEnable(1] 


GenCmUserModeEnable[1 ] 


0x4C 


GenCntEnable[2] 


GenCntUserModeEnabre[2] 


0x50 


GenCntUserModeEnabfe 


Supervisor data mode only 


0x54 


DebugSelect 


Supervisor data mode only 


0x58 


PulseTimerStatus 


Supervisor data mode only 
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16 Clocking, Power and Reset (CPR) 

The CPR block provides all of the clock, power enable and reset signals to the SoPEC device. 

16.1 POWERDOWN MODES 

The CPR block is capable of powering down certain sections of the SoPEC device. When a section is pow- 
ered down (i.e. put in sleep mode) no state is retained, the CPU must rc-initialize the section before it can 
be used again. The exact powerdown mechanism is undefined and is technology dependent. 

For the putpose of powerdown the SoPEC device is divided into sections: 



Table 53. Powerdown sectioning 







Print Engine Pipeline Subsystem 
(Section 0) 


CDU 


CFU 




LBD 




SFU 




TE 




TFU 




HCU 




DNC 




DWU 




LLU 




PHI 


CPU-DRAM (Section 1) 


ORAM 




CPU/MMU 




DtU 




TIM 




ROM 




LSS rmerface 


Comms Subsystem (Section 2} 


USB 




IS] 




DMA Ctrl 




GPIO 




PSS 




ICU 



16.1.1 Sleep mode 

Each section can be put into sleep mode by setting the corresponding bit in the SleepModeEnable register. 
To re-enable the section the sleep mode bit needs to be cleared and then the section should be reset by 
writing to the relevant bit in the ResetSection register. Each block within the section should then be re-con- 
figured by the CPU. 

I If the CPU system is put into sleep mode, the SoPEC device will remain in sleep mode until a system level 

reset is initiated from the reset pin, or a wakeup reset by the SCB block as a result of activity on either the 
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USB or rSI bus. If all sections are put into sleep mode, then only a system level reset initiated by the reset 
pin will re-activate the SoPEC device. 

Like all software resets in SoPEC the ResetSection register is active-low i.e. a 0 should be written to each 
bit position requiring a reset. The ResetSection register is self-reseting. 

16.2 Reset SOURCE 

The SoPEC device can be reset by a number of sources. When a reset from an btemal source is intiated 
the reset source register (ResetSrc) stores the reset source value. This register can then be used by the CPU 
to determine the type of boot sequence required. 

1 6.3 Clock relationship 

The crystal oscillator excites a 32MHz crystal through the xtalin and xtalout pins. The 32MH2 output is 
used by the PLL to derive the master VCO frequency of 960MHz. The master clock is then divided to pro- 
duce 320MHz clock (cik320\ l60MHz clock {clkl60\ 106MHz clock (clkI06) and 48MH2 (clk48) clock 
sources. 

The phase relationship of each clock from the PLL will be defined. The relationship of internal clocks 
clk320, clkl06, clk48 and clkl60 to xtalin will be undefined. The clock tree generation should create inser- 
tion delays so as to compensate for the phase difference of the clocks leaving the PLL. At the output of the 
clock block, the skew between each pclk domain (pclk_section[3:0] and jclk) should be within skew toler- 
ances of their respective domains (defined as less than the hold time of a D-type flip flop). 

The skew between doclk and pkiclk should also be less than the skew tolerances of their respective 
domains. 

The usbclk is derived from the PLL output and has no relationship with the other clocks in the system and 
is considered asynchronous. 
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There is no skew requirement between the pclk domains and the doclk and phiclk domains, they are con- 
sidered essentially asynchronoiis to each other 



1.04ns 

■H I* 



PLL Master Clock 



fUMfiMuiJMnnnfuui^ 



clk320 



doctk 



cikieo 



pc(k 
|clk 



dkioe 



phrdk 




^1 CIK320 PUL shin 



doctk insertion delay 



1^ ► ! dkl 60 PLL phasA shift 



i_r 



\ pdlc^cDc Insertion delay 



clklOe PLL phaaa shift 



l_ 



;4 phidH insertion delay 

Figure 53. SoPEC clock relationship 
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1 6.4 Implementation 
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1 6.4.1 Definitions of I/O 



Table 54. CPR 1/0 definition 









aoeks and Resets 


xtalin 




In 


Crystal Input, direct from lO pin. 


xtalout 




Out 


Crystal output, direct to 10 pin. 


pdk_s6Ction[2:0] 




Out 


System docks for each section 


phidk " 




Out 


Printhead interface dock (dodk/3) for the PHI bkx* 


doclk 




Out 


Data out clock (2x pdk) for the PHI block 


Jclk 




Out 


Gated version of system dock used to clock the JPEG decoder 
core in the COU 


usbdk 




Out 


USB dock at 3 times the crystal input frequency, nominally at 46 

Mhz 


jdk.enable 




In 


Gating signal torjdk. 


r©set_n 




In 


Reset signal from the reset^n pin 


usb„cpr_reseUn 




In 


Reset signal from the USB block 


tsLcpr_reset_n 




In 


Reset signal from the iSI block 


tim_cpr_reset_n 




In 


Reset signal from watch dog timer. 


pret_n_sectlon[2:0] , 




Out 


System resets for each section, synchronous active low 


phirst_n 




Out 


Reset for PHI block, synchronous to phidk 


dorst.n 




Out 


Reset tor PHI block, synchronous to doctk 


|rst_n 




Out 


Reset for JPEG decoder core in CDU bfock, synchronous to ;d;c 


usbrsl_n 




Out 


Reset for the USB bfock, synchronous to usbdk 


Test Input 


test.dk 




In 


Test dock direct from external pin. for use in production test (scan 
test) 


test_enablo 




In 


Test enable. Direct from external pin. When high production test 
mo6e is enat>ied. 


CPU Interface 


cpu_adrt3:21 


2 


In 


CPU address bus. Only 2 bits are required to decode the address 
space for the CPR bfock 


cpu_dataout[31:0] 


32 


In 


Shared write data t>us from the CPU 


cpr_cpu_clata[31 :0] 


32 


Out 


Read data bus to the CPU 


cpu_rwn 


1 


In 


Common read/not-write signal from the CPU 


cpu_cpr_sel 


1 


In 


Block select from the CPU. When cpu^cpr^sel is high both 
cpu_adr ar\6 cpu^dataout are valid 


cprjcpu^fdy 


1 


Out 


Ready signal to the CPU. When c^r_cpu_rtiy is high it indicates 
the last cyde of the access. For a write cyde this means 
cpu.dataouf has been registered by the block and for a read cyde 
this means the data on cpr^cpu^data Is valkl. 


cpr_cpu_berr 


1 


Out 


Bus error signal to the CPU indicating an invaftd access. 


cpu_acode[1 X)J 


2 


In 


CPU Access Code signals. These decode as folfows: 

00 - User program access 

01 - User data access 

10 - Supervisor program access 

1 1 - Supervisor data access 


cpr_cpu_debug_valid 


1 


Out 


Debug Data valid on cpr^cpujdata bus. Active high 
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Table 54. CPR I/O definition 





as 


m 










Miscellaneous | 


pwr_sleep_nrKXje[2:0J 


1^ 




Out 


\ Sleep mode section select 







16.4.2 Configuration registers 

The configuration registers in the CPR are programmed via the CPU interface. Refer to section 11.4 on 
page 69 for a description of the protocol and timing diagrams for reading and writing registers in the CPR. 
Note that since addresses in SoPEC are byte aligned and the CPU only supports 32-bit register reads and 
writes, the lower 2 bits of the CPU address bus are not required to decode the address space for the CPR. 
When reading a register that is less than 32 bits wide zeros should be returned on the upper unused bit(s) 
of cpr^cu_data. Table 55 lists the configuration registers in the CPR block. 

The CPR block will only allow supervisor data mode accesses (i.e. cpu_acode[l:0] - 
SUPERVISORj:>ATA ). All other accesses will result in cpr_ppuj>err being asserted . 



Table 55. CPR Register Map 













0x00 


SleepModeEnat)(e 


3 


0x0 


Sleep Mode enable, when high a section of logtc 
has is powerdown. Each bit controls a section 


0x04 . 


ResetSrc 


4 


0x0» 


Reset Source register. Indicating the source of 

the last reset 

Bit 0 - External Reset 

Bit 1 - USB wakeup reset 

Bit 2 ' IS) wakeup reset 

Bit 3 ■ Watchdog timer reset 


0x06 


ResetSection 


3 


0x7 


Active-low synchronous reset for each section, 
self-resetting. 


OxOC 


DebugSelect 


6 


0x00 


Debug address select. Indicates the address of 
the register to report on the cprjcpujsiata bus 
when it is not othenvise being used. 


PLL Control (Asynchronous reset registers) 


0x10 


PLLTuneBits 


10 


0x23E 


PLL tuning bits 


0x14 


PLLRangeA 


4 


OxF 


PLLOUT A frequency selector (defaults to 
eOOMhz to1250Mh2) 


0X18 


PLLRangeB 


3 


0x7 


PLLOUT B frequency selector (defaults to 
600Mh2to1250Mh2) 


0x1 C 


PLLMultiplier 


5 


0x25 


PLL multtplier selector. deCaults to refctkx 20 



a. Reset value depends on reset source. External reset shown. 
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SI 



16.4.3 CPR Sub-block partition 



tesLenable- 
tasudk- 



xtafin ' 



xtalout 



test^enabte - 



Crystal 
Oscillator 






► 


PLL 







JcJk^enablB 



pwr^sleep_mod»4- 



reset^n - 
ijsb_jcpr_fesol_n - 
lsi_cpr_reset.n - 
titn.cpr_reset,n - 



Gate Enable 
Logic 



te$t.enat}(e • 



Reset 
Logic 



Contiguratton registers 
AAA 



1 
8; 



32 



/32 



§ 



1 

es 

I 



CPU 



dk320 



dk48 



dk160 



gate.dom 



> Clock 



Gate 



> Clock 



Gate 



Clock 
Gate 



Clock 
Gate 



Clock 
Gate 



Clock 
Gate 



^ Clock 
[B|^ Gate 



— » pdk.sectfonp] 



^sectlon(11 



^ J->^— ►Pdk.si 

— ^^^-^ pcIk.sectlon(21 



dod)« — ^ 
reset_dofn[0} ^ 


Reset 
Sync 


pWdh — ^ 
feset_dom(l] ^ 


Reset 
Sync 


ust>dk — ^ 
reset_domI2J ^ 


Reset 
Sync 


pclK.seGtk)n(0|— ^ 
resat_dofn(3| ^ 


Reset 
Sync 


pdk_section{ 1 ) ► 
rcset_domf4l ^ 


Reset 
Sync 


pdk.8ectk)n[2] ^ 

reset domfSI ^ 


Reset 
Sync 


idk ► 

/eset„dofn[6] ^ 


Reset 
Sync 



► dor5t_n 
phirst^n 
usbrsun 
prst_ruseclion[01 



►jrst_n 



Clodc driver 



Figure 54. CPR block partition 
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16.4.4 Sync reset 

The reset synchronizer retimes an asynchronous reset signal to the clock domain that it resets. The circuit 
prevents the inactive edge of reset occurring when the clock is rising 



pdkf 
reset.dom 



1 r 



J L 



prsLn 



synchronizer 



reset^dom 




prst_n 



Figure 55. Reset synchronizer logic 



16.4.5 Reset generator logic 

The reset generator logic is used to determine which clock domains should be reset, based on configured 
reset values {reset_section_n), the external reset (reset_n)y watchdog timer reset (tim_cpr_reset_n) and 
resets from the SCB block {isi_cpr_reset_n, usbjcpr^resetjn). The reset direct from the lO pin {reset jn) is 
synchronized and de-glitched before feeding the reset logic. 

Resets from the SCB block reset everything except its own section (section 2), this allows data to be stored 
in the PSS block for use after a SCB powerup initiated reset. 



Tabre 56. Reset domains 







reset__dom[0] 


doclk domain 


reset_doiTi(1] 


phiclk domain 


reset_dom[2] 


usbctk domain 


reset_dom[3) 


Section 0 polk domain 


reset_dom[4] 


Section 1 pclk domain 


reset_dom(5] 


Sectton 2 pclk domain 


reset_dom(6] 


Jdk domain 



The logic is given by 

if (reset^n -- 0) then 

resec.doin(6:0] = 0x00 // reset everything 

reseC_src[3 :0] = 0x01 
elsif <usb_cpr_reset_n 0) then 

reset_dom(6 :0] = 0x20 // all except coimns domain 

reset^srcO :0) » 0x02 
elsif ( is i_cpr_reset_n =» 0) then 

reset^dom [ 6 : 0 J = 0x20 //all except comma domain 

reset_srct3 : 0) 0x04 
elsif . {tinv_cpr_reset_n =« 0) then 

reset.dom[6:0] = 0x00 // reset everything 

reset_src(3 :0I = 0x08 
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else 

// propagate resets from reset section register 

reset.domtS :0} = 0x3 P 

if (reset_8ection.n(0] == 0> then 

reset.dom [ 3 ] « 0 
if (reset_section^(l) == 0) then 

reset_doniC4) s 0 
If (reset^8ectlon_n(21 0) then 

reset„doin[5] » 0 



16.4.6 Gate enable logic 

The gate enable logic is a combinational logic block used to generate gating signals for each of SoPECs 
clock domains. The gate enable (gate_domain) is generated based on the configured sleep _jnode_en and 
the internally generated Jclk_enable signal. 

The logic is given by 

// clock gating for sleep modes 
gate_dom ( 5 : 3 ) = 0x7 // default to on 
for (i=0 ;i < 3 ; i>+) { 

if (sleep_mode_en[il 1) then 
gate_domCi+3) = 0 
pwr_5leep.inode[ll == 1 

> 

// jclk and remaining 
gate„doiii[2 :0) » 0x7 
gate_dom£61 = -*( jclK_enable) 

16.4.7 Clock gate logic 

The clock gate logic is used to safely gate clocks without generating any glitches on the gated clock. When 
the enable is high the clock is active otherwise the clock is gated. 

arc.cK I I I n I n I I 

gate_dom | I 

gate_dom_fetimed j j 

gate.clock J ] | | 



gate^dom ► i 



src_clk- 



gato_dom_retimed 



► gate.ctock 



Figure 56. Clock gate logic diagram 
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16.4.8 Ciock generator Logic 

The clock generator block contains the PLL, crystal oscillator, clock dividers and associated control and 
. test logic. The PLL VCO frequency is at 960Mhz locked to a 32 Mhz refclk generated by the crystal oscil- 
lator In test mode the xtaiin signal can be driven directly by the test clock generator, the test clock will be 
reflected on the refclk signal to the PLL. 

test^enable _ , 



xtaiin — 
xtalout ^ 



Crystal 

Oscillator 



refclk 



pIL'ange^a • 
pU_ranoe_b - 
pILmultplier * 
pILtune - 



prst_n- 



PLL 



pll outb 



P" lock 



p'Loyta 



pILoutc 



Ctock 
Divider A 



Clock 
Divider B 



-cJk320 
•'Clkieo 
»clk106 



»dk48 



Figure 57. PLL and Clock divider logic 



16.4.8.1 dock divider A 

The clock divider A block generate the 320Mh2. 160Mh2 and I06Mhz clocks from the input 320Mhz 
clock (pil_outb) generated by the PLL. The divider flips flops are asynchronously reset by the prst^n sig- 
nal. The divders are enabled only when the PLL has acquired lock as indicated by the pll Jock signal. 

16.4.B.2 Ciock divider B 

The clock divider B block generate the 48Mh2 clock from the input 96Mhz clock (pil^outa) generated by 
the PLL. The divider flips flops are asynchrously reset by the prst_n signal. The divders are enabled' only 
when the PLL has acquired lock as indicated by the plljoclc signal. 
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17 ROM Block 



17.1 Overview 

The ROM block interfaces to the CPU bus and contains the SoPEC boot code. The ROM block consists of 
the CPU bus interface, the ROM macro and the ChipID macro. The current ROM size is 16 KBytes imple- 
mented as a 4096 x32 macro. Access to the ROM is not cached because the CPU enjoys fest (no more than 
one cycle slower than a cache access), unarbitrated access to the ROM. 

Each SoPEC device is required to have a unique ChipID which is set by blowing fuses at manufacture. 
IBM*s 300mm ECID macro is to be used to implement the ChipID and this offers 112-bits of laser fuses. 
The exact number of fuse bits to be used for the ChipID will be determined later but all bits are made 
available to the CPU. The ECID macro allows all 1 12 bits to be read out in parallel and the ROM block 
will make all 1 12 bits available in the FuseChipJD[N] registers which are readable by the CPU in supervi- 
sor mode only. 



17.2 BOOT OPERATION 

The are two boot scenarios for the SoPEC device namely after power-on and after being awoken from 
sleep mode. When the device is in sleep mode it is hoped that power will actually be removed from the 
DRAM, CPU and most other peripherals and so the program code will need to be freshly downloaded each 
time the device wakes up from sleep mode. In order to reduce the wakeup boot time (and hence the per- 
ceived print latency) certain data items are stored in the PSS block (see section 18). These data items 
include the SHA-l hash digest expected for the program(s) to be downloaded, the master/slave SoPEC id 
and some configuration parameters (currently TBD). All of these data items are stored in the PSS by the 
CPU prior to entering sleep mode. The SHA-l value stored in the PSS is calculated by the CPU by 
decrypting the signature of the downloaded program using the appropriate public key stored in ROM. This 
compute intensive decryption only needs to take place once as part of the power-on boot sequence - subse- 
quent wakeup boot sequences will simply use the resulting SHA-l digest stored in the PSS. Note that the 
digest only needs to be stored in the PSS before entering sleep mode and the PSS can be used for tempo- 
rary storage of any data at all other times. 

The CPU is expected to be in supervisor mode for the entire boot sequence described by the pseudocode 
below. Note that the boot sequence has not been finalised but is expected to be close to the following: 

if (ResetSrc == 1) then // Reset was a power-on reset 

conf igure_sopec // need to configure peris {USB, ISI, I»!A. ICU etc.) 
// otherwise reset was a wakeup reset so peris etc. were already configured 
PAUSE: wait until IrqSeinaphore \« 0 // i.e. wait until an interrupt has been serviced 
if (IrqSeinaphore DHAChanOKsg) then 

par3e_msg(mAChan0MsgPtr) // this routine will parse the message and take any 

// necessary action e.g. programming the DMAChannell registers 
elsif (IrqSeinaphore == DMAChanlMsg) then // program has been downloaded 

CalculatedHash = gen_shal (ProgramLocn, ProgramSize) 

if (ResetSrc == 1) then 

ExpectedHash = sig^decrypt (Programs ig) 

else 

ExpectedHash = PSSHash 
if (ExpectedHash == CalculatedHash) then 

jmp(PrgrainLocn> // transfer control to the downloaded program 
else 

send_host_p»sg ("Program Authentication Failed") 
goto PAUSE: 

elsif (IrqSeinaphore == timeout) then // nothing has happened 
if (ResetSrc •» 1) then 
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sleep..pK>de() // put SoPEC Into sleep mode to be woken up by USB/ZSI activity 
else //we were woken up but nothing happened 
reset^sopec ( PowerOnReset ) 

else 

goto PAUSE 

The boot code places no restrictions on the activity of any programs downloaded and authenticated by it 
other than those imposed by the configuration of the MMU i.e. the principal fvinction of the boot code is to 
authenticate that any programs downloaded by it are from a trusted source. It is the responsibility of the 
downloaded program to ensure that any code it downloads is also authenticated and that the system 
remains secure. The downloaded program code is also responsible for setting the SoPEC ISIId (see section 
12.7 for a description of the ISIId) in a multi -SoPEC system. See the "SoPEC Security Overview" docu- 
ment [9] for more details of the SoPEC security features. 

17.3 Implementation 

1 7.3.1 Definitions of I/O 



Table 57. ROM Block I/O 




Clocks and Resets 



pr5l_n 


1 


In 


Global reset Synchronous to pclk, active low. 


pdk 


1 


In 


Global clock 


CPU Interface 


cpu_adr(15:2) 


14 


In 


CPU acfdress bus. Only 14 bits are required to decode the address 
space for this block. 


rom_cpu_data(31 :0) 


32 


Out 


Read data bus to the CPU 


cpu^nvn 


1 


In 


Common read/not-write signal from the CPU 


cpu.acbde[1:0] 


2 


In 


CPU Access Code signals. These decode as follows: 

00 - User program access 

01 - User data access 

10 • Supervisor program access 

1 1 - Supervisor data access 


cpu_rom_S0l 


1 


In 


Block select from the CPU. When cp(c/oin_se/ is high cpu^adr is 

valid 


rom_cpu_rdy 


1 


Out 


Ready signal to the CPU. When rom_cpu_rcfy\s high it indicates 
the last cyde of the access. For a read cyc/e this means the data on 
n>m_cpu_data is valid. 


rom_cpu_berr 


1 


Out 


ROM bus error signal to the CPU indicating an invalid access. 



17.3.2 Configuration registers 

The ROM block will only allow read accesses to the FuseChipJD registers with supervisor data space per- 
missions (i.e. cpujacode[l:0] = 11). All other accesses of the FuseChipID registers will result in 
rom_cpuJberr being asserted. The ROM block allows all read accesses to the ROM itself (i.e supervisor or 
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user, data or program accesses). The CPU subsystem bus slave interface is described in more detail in sec- 
tion 9.4.3. 



Table 58. ROM BJock Register Map 













0x8000 to 
0x8004 


FuseChlplD[N] 


32 


n/a 


Value of oorrespondtng fuse bits. (Read only) 



17.3.3 Sub-Block Partition 

IBM offer two variants of their ROM macros; A high performance version (ROMHD) and a low power 
version (ROMLD). It is likely that the low power version will be used imless some implementation issue 
requires the high performance version. Both versions offer the same bit density. The sub-block partition 
diagram below does not include the clocking and test signals for the ROM or ECID macros. The CPU sub- 
system bus interface is described in more detail in section 11. 4.3. 



ROM Macro 
4096 X 32 



IBM 300mm ECID macro 



I 
I 
1 

Fuseool 



k 

H 
N 
H 

Fuseni 



rom_adr 



rom.dala 



fuse^data 



fuse.reg_adr 



4^ 



GPU Bus 
Interface 



14. 



!^ cpu.adr 



3^ ► rom_cpu_clata 

4 cpu,rom_sel 

4 cpu_rwn 

► rom_cpu_rdy 

4 ^ — cpu_acode 

► rom_cpu_berr 



Figure 58. Sub-block partition of the ROM block 

17.3.4 Sub*block signal definition 

Table 59. ROM Block Internal signals 











Clocks and Resets 


prst_n 


1 




Global reset. Synchronous to pdk. active low. 
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Table 59. ROM Block Internal signals 









pdk 




Global dock 


Internal Signals 


rom^adrl 11:01 


12 


ROM address bus 




1 


Select signal to the ROM macro Instnjcting it to access the location 
at tx>m_adr 


rom.oe 


1 


Output enat>!e signal to the ROM block 


roni_data(31:0] 


32 


Data bus from the ROM macro to the CPU bus interface 


rom^dvalld 


1 


Signal from the ROM macro indicating that the data on romjdata is 
valid for the address on rom_adr 


fuse_data(31 :01 


32 


Data from the FuseChipfD[NJ register addressed by fusB_reg^adr 


fuse_refl_adr(1 :0] 


2 


Indicates which of the FuseChipID registers Is being addressed 
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18 Power Safe Storage (PSS) Block 

18.1 Overview 

The PSS block provides 1 28 bytes of storage space that will maintain its state when the rest of the SoPEC 
device is in sleep mode. The PSS is expected to be used primarily for the storage of decrypted signatures 
associated with downloaded programmed code but it can also be used to store any information that needs 
to survive sleep mode (e.g. configuration details). Note that the signature digest only needs to be stored in 
the PSS before entering sleep mode and the PSS can be used for temporary storage of any data at all other 
times. 

Prior to entering sleep mode the CPU should store all of the information it will need on exiting sleep mode 
in the PSS. On emerging from sleep mode the boot code in ROM will read the ResetSrc register in the CPR 
block to detennine which reset source caused the wakeup. The reset source information indicates whether 
or not the PSS contains valid stored data, and the PSS data determines the type of boot sequence to exe- 
cute. If for any reason a full power-on boot sequence should be performed (e.g. the printer driver has been 
updated) then this is simply achieved by initiating a full software reset. 



18.2 Implementation 

The storage area of the PSS block will be implemented as a 128-byte register array. The array is located 
from PSS_base through to PSS_base+0x7F in the address map. The PSS block will only allow read or 
write accesses with supervisor data space permissions (i.e. cpu^acodefL OJ = 11). All other accesses will 
result in pss_cpujberr being asserted The CPU subsystem bus slave interface is described in more detail 
in section 11.4.3, 
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18.2.1 Definitions of I/O 



Table 60. PSS Brock I/O 











Clocks and Resets 


prst_n 


1 


In 


Global reset. Synchronous to pdk. active tow. 


pdk 


1 


In 


Global clock 


CPU Interface 


cpu.adil6:2] 


5 


In 


CPU address bus. Only 5 bits are required to decode the address 
space (or this block. 


cpu_dataout[31:0) 


32 


In 


Shared write data bus from the CPU 


pss_cpu_da1a(31 :0] 


32 


Out 


Read data bus to the CPU 


cpu_rwn 


^ 


In 


Common read/not-wrlte signal from the CPU 


cpu.acode[1:0] 


2 


In 


CPU Access Code signals. These decode as follows: 

00 - User program access 

01 - User data access 

1 0 - Supervisor program access 

1 1 - Supervisor data access 


cpu_pss_sel 


1 


In 


Block select from the CPU. When cpi/jjss^se/is high both cpu^adr 
and cpujdataout are valid 


pss_cpu_rdy 


1 


Out 


Ready signal to the CPU. When pss_cpu_/rjy is high it Indicates the 
last cycle of the access. For a read cycle this means the data on 

pssjcpu_(Sata is valid. 


pss_cpu_berr 


1 


Out 


PSS bus error signal to the CPU indicating an invalid access. 
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19 Low Speed Serial Interface (LSS) 



19.1 Overview 



The Low Speed Serial Interface (LSS) provides a mechanism for the internal SoPEC CPU to communicate 
with external QA chips via two independent LSS buses. The LSS conmiunicates through the GPIO block 
to the QA chips. This allows the QA chip pins to be reused in multi-SoPEC environments. The LSS Mas- 
ter system-level interface is illustrated in Figure 59. Note that multiple QA chips are allowed on each LSS 
bus. 

CPU sub-system bus 



CPU 



LSS Master 



SoPEC 

LSS bus 0 



GPIO 



QA Chip 0 



QAChip 1 



LSS bus 1 



QA Chip 2 



QA Chip 3 



Figure 59. LSS master system-level interface 



19.2 QA COMMUNICATION 

The SoPEC data interface to the QA Chips is a low speed, 2 pin, synchronous serial bus. Data is trans- 
ferred to the QA chips via the lss_data pin synchronously with the Iss^clk pin. When the Iss^clk is high the 
data on Issjdata is deemed to be valid. Only the LSS master in SoPEC can drive the Issjclk pin, this pin is 
an input only to the QA chips. The LSS block must be able to interface with an open-collector pull-up bus. 
This means that when the LSS block should transmit a logical zero it will drive 0 on the bus. but when it 
should transmit a logical 1 it will leave high-impedance on the bus (i.e. it doesn't drive die bus). If all the 
agents on the LSS bus adhere to this protocol then there will be no issues with bus contention. 

The LSS block controls all communication to and from the QA chips. The LSS block is the bus master in 
all cases. The LSS block interprets a command register set by the SoPEC CPU, initiates transactions to the 
QA chip in question and optionally accepts return data. Any return information is presented through the 
configuration registers to the SoPEC CPU. The LSS block indicates to the CPU the completion of a com- 
mand or the occurrence of an error via an interrupt, 

19.2.1 Start and stop conditions 

All transmissions on the LSS bus are initiated by the LSS master issuing a START condition and termi- 
nated by the LSS master issuing a STOP condition. START and STOP conditions are always generated by 
the LSS master. As illustrated in Figure 60, a START condition corresponds to a high to low transition on 
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Issjdata while bsjcik is high. A STOP condition corresponds to a low to high transition on iss^data while 
bs^clk is high. 



Iss.data 



Iss.clk 



f 



STAirr 

CONDITION 



STOP 
CXDNDmON 



Rgure 60. START and STOP conditions 



19.2.2 Data transfer 

Data is transferred on the LSS bus via a byte orientated protocol. Bytes are transmitted serially. Each byte 
is sent most significant bit (MSB) first through to least significant bit (LSB) last. One clock pulse is gener- 
ated for each data bit transferred. Each byte must be followed by an acknowledge bit. 

The data on the Issjiata must be stable during the HIGH period of the Iss^clk clock. Data may only 
change when Iss^clk is low. A transmitter outputs data after the falling edge of bsjclk and a receiver inputs 
the data at the rising edge of Iss^clk, This data is only considered as a valid data bit at the next lss_jclk fall- 
ing edge provided a START or STOP is not detected in the period before the next lss_clk falling edge. All 
clock pulses are generated by the LSS block. The transmitter releases the Issjiata line (high) during the 
acknowledge clock pulse (ninth clock pulse). The receiver must pull down the lss_data line during the 
acknowledge clock pulse so that it remains stable low during the HIGH period of this clock pulse. 

Data transfers follow the format shown in Figure 61. The first byte sent by the LSS master after a START 
I condition is a primary id byte» where bits 7-2 form a 6-bit primary id (0 is a global id and will address all 

QA Chips on a particular LSS bus), bit I is an even parity bit for the primary id, and bit 0 forms the read/ 
write sense. Bit 0 is high if the following command is a read to the primary id given or low for a write 
command to that id. An acknowledge is generated by the QA chip(s) corresponding to the given id (if such 
a chip exists) by driving the Iss^data line low synchronous with the LSS master generated ninth Iss^clk, 



Iss.dala ^ j /"bio 7 - bit O ^ Ack / ^ bits 7 \ >^ck f ^^ul - l] [ bit 0 Nack i 

!§' u II 11 1 1 ri I 1 II I 1?: 

START IDbyM?:!) R/W ACK DATA ACK DATA ACK STOP 

conditioD condition 



Rgure 61. LSS transfer of 2 data bytes 



19.2.3 Write procedure 



The protocol for a write access to a QA Chip over the LSS bus is illustrated in Figure 63 below. The LSS 
master in SoPEC initiates the transaction by generating a START condition on the LSS bus. It then trans- 
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mits the primary id byte with a 0 in bit 0 to indicate that the following command is a write to the primary 
id. An acknowledge is generated by the QA chip corresponding to the given primary id. The LSS master 
will clock out M data bytes with the slave QA Chip acknowledging each successful byte written. Once the 
slave QA chip has acknowledged the data byte the LSS master issues a STOP condition to complete 
the transfer. The QA chip gathers the M data bytes together and interprets them as a command. See QA 
Chip Interface Specification for more details on the format of the commands used to communicate with 
the QA chip[8]. Note diat the QA chip is free to'not acknowledge any byte transmitted. The LSS master 
should respond by issuing an interrupt to the CPU to indicate this error. The CPU should then generate a 
STOP condition on the LSS bus to gracefully complete the transaction on the LSS bus. 



ByteO 



ByteM-1 ByteM 















s 


ID byte[7: 1} 


0 




¥ 


Dau(8) 



S = Stan condition 
A = Ack 
N = Nack 
P = Stop condition 
Shaded bits driven by slave 



Figure 62. Example of LSS write to a QA Chip 



19.2.4 Read procedure 

The LSS master in SoPEC initiates the transaction by generating a START condition on the LSS bus. It 
then transmits the primary id byte with a I in bit 0 to indicate that the following command is a read to the 
primary id. An acknowledge is generated by the QA chip corresponding to the given primary id. The LSS 
master releases the Iss^data bus and proceeds to clock the expected number of bytes from the QA chip 
with the LSS master acknowledging each successful byte read. The last expected byte is not acknowledged 
by the LSS master. It then completes the transaction by generating a STOP condition on the LSS bus. See 
QA Chip Interface Specification for more details on the format of the commands used to commimicate 
with the QA chip[8]. 



ByteM 




S = Start condition 
A = Ack 
N = Nack 
P ~ Stop condition 
Shaded bits driven by slave 



Figure 63. Example of LSS read from QA Chip 
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19.3 Implementation 

A block diagram of the LSS master is given in Figure 64. It consists of a block of configuration registers 
that are programmed by the CPU and two identical LSS master units that generate the signalling protocols 
on the two LSS buses as well as intemipts to the CPU. The CPU initiates and terminates transactions on 
the LSS buses by writing an appropriate command to the command register, writes bytes to be transmitted 
to a fifo and reads bytes received from a fifo, and checks the sources of interrupts by reading status regis- 
ters. 



CPU 



▲ ▲ A 



•8: 



Low Speed Serial 
Interface 



/ 2 / 



32 



32 



t t t t t 



configuration registers 



'22 
t t 



AAA 



% y'k ^ "5x32 ^ ^ ^32 



LSS bus 0 
master unit 



/ 22 



▲ ▲ ▲ 



^3 /a 



5x32 



J[S /^32 



LSS bus 1 
master unit 



/'I 



^ 2 



t Y 



GPIO 



ICU 



Figure 64. LSS block diagram 
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19.3.1 Definitions of lO 

Table 61 . LSS tO pins definitions 



aocks and Resets 


pdk 


1 


In 


System Clock 


prst_n 


1 


In 


System reset, synchronous active k3w 


CPU Interface 


cpu_rwn 


1 


In 


Common read/not-write signal from the CPU 


cpu_adrt7:21 


5 


In 


CPU address bus. Only 6 bits are required to decode the 

address space for this block 


cpu_dataout(d1:0] 


32 


In 


Shared write data bus from the CPU 


cpu_acode[1:0] 


2 


In 


CPU access code signals. 

cpu_acode[01 - Program (0) / Data (1) access 

cpu_acode[1] - User (0) / Supervisor (1) access 


cpii.lss.sel 


1 


In 


Bk>ck select from the CPU. When cpiijss_se/ Is high both 
cpu_ac/r and cpu_dataout ese valid 


lss_cpu_rdy 


1 


Out 


Ready signal to the CPU. When {ss_cpu_rdy \s high it Indicates 
the last cycle of the access. For a write cycle this means 
cpu^dataout has been registered by the LSS block and for a 
read cycle this means the data on fss^cpu_data is valid. 


lss_cpu_berr 


1 


Out 


LSS bus error signal to the CPU. 


lss_cpu_data[31 :0J 


32 


Out 


Read data bus to the CPU 


lss_cp u_d e bu g_valid 


1 


Out 


Active high. Indicates the presence of valid debug data on 
ls$_cpu_data. 


GPIO for LSS buses 


lss_flpio_do[1 :0] 


2 


Out 


LSS bus data output 
Bit 0 - LSS tHJS 0 
Bit 1 - LSS bus 1 


gpio Jss_di[1 :0] 


2 


In 


LSS bus data input 
Bit 0 - LSS bus 0 
Bit 1 - LSS bus 1 


Jss_jgpio_e[l :01 


2 


Out 


LSS bus data output enable, active high 
Bit 0 - LSS bus 0 
Bit 1 - LSS bus 1 


lss_gpio_dk(1:0] 


2 


Out 


LSS bus clock output 
Bit 0 - LSS bus 0 
Bit 1 - LSS bus 1 


iCU Interface 


lss_icujrq[1:0] 


2 


Out 


LSS interrupt requests 

Bit 0 • interrupt associated with LSS t>us 0 

Bit 1 - interrupt associated with LSS bus 1 
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19.3.2 Configuration registers 

The configuration registers in the LSS block are programmed via the CPU interface. Refer to section 1 1.4 
on page 69 for the description of the protocol and timing diagrams for reading and writing registers in the 
LSS block. Note that since addresses in SoPEC are byte aligned and the CPU only supports 32-bit register 
reads and writes, the lower 2 bits of the CPU address bus are not required to decode the address space for 
the LSS block. Table 62 lists the configuration registers in the LSS block. When reading a register that is 
less than 32 bits wide zeros should be returned on the upper unused bit(s) of Issjopujiata. 

The input cpujicode signal indicates whether the current CPU access is supervisor, user, program or data. 
The configuration registers in the LSS block can only be read or written by a supervisor data access, i.e. 
when cpujacode equals bU. If the current access is a supervisor data access then the LSS responds by 
asserting bs_cpu_rciy for a single clock cycle. 

If the current access is anything other than a supervisor data access, then the LSS generates a bus error by 
asserting bsjcpujberr for a single clock cycle instead of lss_cpu_rdy as shown in section 1 1.4 on page 69. 
A write access will be ignored, and a read access will return zero. 



Table 62. LSS Control Registers 













Control registers 








0x00 


Reset 


1 


0x1 


A write to this register causes a reset of the LSS. 


0x04 


LssClockHighPeriod 


16 


oxooca 


High period of tes„c//c expressed as a number of pdk 
cycles. Transmission over the LSS bus is at a nominal 
rate of 400kHz, corresponding to a high period of 200 
pdk (leOMhz) cycles for a 50/50 duty cyde. 


0x08 


LssOockLowPe riod 


16 


0x0008 


Low period of tss^dk expressed as a number of pdk 
cycles. Transmission over the LSS bus is at a nominal 
rate of 400kHz, corresponding to a low period of 200 
pdk (1 60Mhz) cycles for a 50/50 duty cyde. 


LSS bus 0 registers 


0x10 


LssOlntStatus 


3 


0x0 


LSS bus 0 interrupt status registers 

Bit 0 - command completed successfully 

Bit 1 - error during processing of command, 

not -acknowledge received after transmission 

of prinruiry id byte on LSS bus 0 
Bit 2 - error during processing of command. 

not -acknowledge received after transmission 

of data byte on LSS bus 0 
A 1 in a bit of /ssCLsfafus.sdf signal causes the corre- 
sponding bit in LssOintStatus register to be set. 
All the bits In LssOtntStatus are cleared when the 
LssOCmd register gets written to. 
(Reed only register) 


0x14 


LssOCurrentS tate 


4 


0x0 


Gives the current state of the LSS bus 0 state 

machine. (Read only register). 

(Encoding will be specified upon state machine Imple« 

mentation) 


0x18 


LssOCmd 


22 


0x00 
.0000 


Command register defining sequence of events to 
perform on LSS bus 0 before interrupting CPU. 
A write to this register causes all the bits in the 

LssOfntStatus register to be cleared as well as gener- 
ating a lssO_new_cmd pulse. 
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Table 62. LSS Control Registers 









^^^^^ 




0x1C-0x2C 


Us0fitb[4:0] 


5x32 


0x0000 
_0000 


LSS Data buffer. Should be filled with transmit data 
before transmit commanclp or read data t>ytes received 
after a valid read comnuuid. 


LSS bus 1 registers 


0x30 


LssKntStatus 


3 


0x0 


LSS bus 1 interrupt status registers 

Bit 0 - oommand completed successfully 

Bit 1 - error during processing of command, 

not -acknowledge received after transmission 

of primary id byte on LSS bus 1 
Bit 2 - error during processing of command, 

not -acknowledge received after transmisskm 

of data byte on LSS bus 1 
A 1 tn a bit of tss1_status_$et signal causes the corre- 
sponding bit in LssUntStatus register to be set. 
All the bits in LssllntStatus are cleared w^en the 
LsslCmd register gets written to. 
(Read only register) 


0x34 


LsslCurrentState 


4 


0x0 


Gives the current state of the LSS bus 1 state 
machine. (Read only register) 
(Encoding will be specified upon state machine imple- 
mentation) 


0x38 


LsslCmd 


22 


0x00^ 
0000 


Command register defining sequence of events to 
perform on LSS bus 1 before internipting CPU. 
A write to this register causes all the bits in the 
LsslfntStatus register to be cleared as well as gener- 
ating a issl^now^cmd pulse. 


0x3C-0x4C 


LsslBuffert4:0l 


5x32 


0x0000 
.0000 


LSS Data buffer. Should be tilled with transmit data 
before transmit command, or read data bytes received 
after a valid read command. 


Debug register 


s 


0x50 


LssDebugSel 


5 


0x00 


Selects register for debug output. This value is used 
as the input to the register decode logic instead of 
cpu^adr[6:2] when the LSS block is not being 
accessed by the CPU. I.e. when cp(jLi!$5_se/is 0. 
The output tss_cpu^debug^vaiid ra asserted to indi- 
cate that the data on tss^cpu^data Is valM debug 
data. This data can be mutiiplexed onto chip pins dur- 
ing debug mode. 



19,3.2*1 LSS command registers 

The LSS command registers define a sequence of events to perform on the respective LSS bus before issu- 
ing an interrupt to the CPU. There is a separate command register and interrupt for each LSS bus. The for- 
mat of the command is given in Table 63. The CPU vmtes to the command register to initiate a sequeiice 
of events on an LSS bus. Once the sequence of events has completed or an error has occurred, an interrupt 
is sent back to the CPU. 

Some example commands are: 

• a single START condition (5farr - 1 . IdByteEnable = 0. RdWrEnable = 0, Stop = 0) 

• a single STOP condition {Start = 0, IdByteEnable = 0, RdWrEnable = 0, Stop = 1) 

• a START condition followed by transmission of the id byte {Start = 1 , IdByteEnable = 1 , RdWrEnable 
= 0, Stop = 0, IdByte contains primary id byte) 
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• a write transfer of 20 bytes from the data buffer (Start « 0, IdByteEnable «?» 0, RdWrEnable = 1, 
RdWrSense = 0, Stop = 0, TxRxByteCount = 20) 

• a read transfer of 8 bytes into the data biiffcr (Start = 0, IdByteEnable = 0, RdWrEnable = 1, 
RdWrSense = I , ReadNack = ft Stop = 0, TxRxByteCount = 8) 

• a complete read transaction of 16 bytes (.5/<2r/== \JdByteEnable^\, RdWrEnable '^l, RdWrSense^ \, 
ReadNack = /, Stop - XJdByte contains primary id byte, TxRxByteCount 16), etc. 

The CPU can thus program the number of bytes to be transmitted or received (up to a maximum of 20) on 
the LSS bus before it gets interrupted. This allows it to insert arbitrary delays in a transfer at a byte bound- 
ary. For example the CPU may want to transmit 30 bytes to a QA chip but insert a delay between the 20* 
and 21^ bytes sent. It does this by first writing 20 bytes to the data buffer. It then writes a command to gen- 
erate a START condition, send the primary id byte and then transmit the 20 bytes from the data buffer. 
When interrupted by the LSS block to indicate successful completion of the command the CPU can then 
write the remaining 10 bytes to the data buffer. It can then wait for a defined period of time before writing 
a command to transmit the 10 bytes from the data buffer and generate a STOP condition to tenninate the 
transaction over the LSS bus. 

An interrupt to the CPU is generated for one cycle when any bit in LssNIntStatus is set. The CPU can read 
LssNIntStatus to discover the source of the interrupt and can clear a bit in LssNIntStatus by writing a 1 to 
the corresponding bit in LssNIntStatus register. Alternatively the CPU can start a new command which 
will automatically reset all LssNIntStatus bits. 



Table 63. LSS command register description 



wm 






0 


Start 


When 1 , Issue a START condition on the LSS bus. 


1 


IdByteEnable 


ID byte transmit enable: 

1 - transmit byte in idByta field 

0 - ignore byte In tdByte field 


2 


RdWrEnabIa 


ReadAvrite transfer enable: 

0 - ignore settings of RdWrSense, ReadNack and TxRxByteCount 

1 - It RdWrSense is 0, then perform a write transfer of TxRxByteCount bytes from the 

data buffer. 

If RdWrSense is 1 . then perform a read transfer of TxRxByteCount bytes into the 
data Ixiffer. Each byte should be acknowledged and the last byte received Is 
acknowledgsd/not-ecknowtedged according to the setting of ReadNack. 


3 


RdWrSense 


Read/write sense indicator: 
0- write 
1 - read 


4 


ReadNack 


Indicates, for a read transfer, whether to issue an acknowledge or a not-acknowtedge 
after the last byte received (indicated by TxRxByteCount^. 

0 - issue acknowledge after last byte received 

1 - issue not-acknowledge after last byte received. 


5 


Stop 


When 1 , Issue a STOP condition on the LSS bus. 


7:6 


reserved 


Must be 0 


15:8 


IdByte 


Byte to be transmitted if tdByteEnabie is 1 . Bit 8 corresponds to the LSB. 


20:16 


TxRxByteCount 


Number of bytes to be transmitted from the data buffer or the number of bytes to be 
received into the data buffer. The nruucimum value that should be programmed is 20, as 
the size of the data tMjffer is 20 bytes. 



The data buffer is implemented in the LSS master block. When the CPU writes to the LssNBuffer registers 
the data written is presented to the LSS master block via the IssNJbufferjwrdata bus and configuration 
registers block pulses the lssNJbuffer_wen bit corresponding to the register written. For example if LssN- 
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Buffer[2] is written to lssNJbufferjwen[2] will be pulsed When the CPU reads the LssNBuffer registers 
the configuration registers block reflect the IssNJmfferjrdata bus back to the CPU. 

19.3.3 LSS master unit 

The LSS master unit is instantiated for both LSS bus 0 and LSS bus 1. It controls transactions on the LSS 
bus by means of the state machine shown in Figure 65, which interprets the commands that are written by 
the CPU. It also contains a single 20 byte data buffer used for transmitting and receiving data. 

The CPU can write data to be transmitted on the LSS bus by writing to the LssNBuffer registers,. It can also 
read data that the LSS master unit receives on the LSS bus by reading the same registers. The LSS master 
always transmits or receives bytes to or firom the data buffer iathe same order. For example a transmit 
command 

For a transmit command. UsNBuffer[0][7:0] gets transmitted first, then LssNBuffer [0] [15:8], LssNBuf- 
fer [0] [23:16]. UsNBuffer[0][31:24], LssNBuffer [J] [7:0] and so on until TxRxByteCount number of 
bytes are transmitted. A receive command fills data to the buffer in the same order. Each new command the 
buffer start point is reset. 

All state machine outputs, flags and counters are cleared on reset After a reset the state machine remains 
in the Idle state until lss_cmd_valid equals 1 . If the Start bit of the command is 0 the state machine pro- 
ceeds directly to the CheckldByteEnable state. If the Start bit is I it proceeds to the GenerateStart state 
and issues a START condition on the LSS bus. 

In the CheckldByteEnable state, if the IdByteEnable bit of the command is 0 the state machine proceeds 
directly to the CheckRdfVrEnable state. If the IdByteEnable bit is 1 the state machine enters the Sendld- 
Byte state and the byte in the IdByte field of the command is transmitted on the LSS. The WaitForldAck 
state is then entered. If the byte is acknowledged, the state machine proceeds to the CheckRdWrEnable 
state. If the byte is not-acknowledged, the state machine proceeds to the Generatelnterrupt state and issues 
an interrupt to indicate a not-acknowlcdge was received after transmission of the primary id byte. 

In the CheckRdWrEnable state, if the RdWrEnable bit of the command is 0 the state machine proceeds 
directly to the CheckStop state. If the RdWrEnable bit is 1 . count is loaded with the value of the TxRxByte- 
Count field of the command and the state machine enters cither the ReceiveByte state if the RdWrSense bit 
of the command is 1 or the TransmitByte state if the RdWrSense bit is 0. 

For a write transaction, the state machine keeps transmitting bytes from the data buffer, decrementing 
count af^er each byte transmitted, until count is 1. If all the bytes are successfully transmitted the state 
machine proceeds to the CheckStop state. If the slave QA chip not-acknowledges a transmitted byte, the 
state machine indicates this error by issuing an interrupt to the CPU and then entering the Generatelnter- 
rupt state. 

For a read transaction, the state machine keeps receiving bytes into the data buffer, decrementing count 
after each byte transmitted, until count is 1. After each byte received the LSS master must issue an 
acknowledge. After the last expected byte (i.e. when count is 1) the state machine checks the ReadNack bit 
of the command to see whether it must issue an acknowledge or not-acknowledge for that byte. The 
CheckStop state is then entered. 

In the CheckStop stale, if the Stop bit of the command is 0 the state machine proceeds directly to the Gen- 
eratelnterrupt state. If the Stop bit is 1 it proceeds to the GenerateStop state and issues a STOP condition 
on the LSS bus before proceeding to the Generatelnterrupt state. In both cases an interrupt is issued to 
indicate successful completion of the command. 
The state machine then enters the Idle state to await the next command. 

The CPU may abort the cuirent transfer at any time by performing a write to the Reset register of the LSS 
block. 
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19.3.3. 1 START and STOP generation 

START and STOP conditions, which signal the beginning and end of data transmission, occur when the 
LSS master generates a ^ling and rising edge respectively on the data while the clock is high. 

In the GenerateStart state, Iss^gpio^clk is held high with Iss _gpiojs remaining deasserted (so the data line 
is pulled high externally) for LssClockHighPeriod pclk cycles. Then Iss^iojs is asserted and 
Iss^iojdo is pulled low (to drive a 0 on the data line, creating a falling edge) with Iss _gpio_clk remain- 
ing high for another LssClockHighPeriod pclk cycles. 

In the GenerateStop state, both Iss^gpiojclk and lss^£pio_do are pulled low followed by the assertion of 
iss^^gpio^e to drive a 0 while the clock is low. After LssClockLowPeriod pclk cycles, Iss ^gpio^clk is set 
high. After a further LssClockHighPeriod pclk cycles, lss^_gpio_e is deasserted to release the data bus and 
create a rising edge on the data bus during the high period of the clock. 



19.3.3.2 dock puise generation 

The LSS master holds lss_gpio_clk high while the LSS bus is inactive. A clock pulse is generated for each 
bit transmitted or received over the LSS bus. It is generated by first holding Iss ^^io_clk low for LssClock- 
LowPeriod pclk cycles, and then high for LssClockHighPeriod pclk cycles. 



19,3,3.3 Data reception 

The input data, gpiojssjdi, is first synchronised to the pclk domain by means of two fiip-fiops clocked by 
pclk The LSS master generates a clock pulse for each bit received. The output Iss ,^io^e is deasserted on 
the falling edge of lss,_gpio^clk to release the data bus. The value on the synchronised gpiojbs^di is sam- 
pled on the rising edge of iss_gpio_clk (the data should be averaged over a further 3 stage register to avoid 
possible glitch detection). The data is only considered as a valid bit at the next falling edge of Iss ^^io^clk 
provided a START or STOP is not generated in the meantime. 

In the ReceiveByte state, the state machine generates 8 clock pulses. On each rising edge of Iss _^io_clk 
the synchronised gpiojss^di is sampled. The first bit sampled is LssNBuffer[0] [7J , the second LssNBuf- 
fer[0][6], etc to LssNBuffer[0][0]. For each bj^e received the state machine either sends an NAK or an 
ACK. depending on the conunand configuration and the number of bytes received. 

In the SendNack state the state machine generates a single clock pulse. lss_gpioje is deasserted and the 
LSS data line is pulled high extemally to issue a not-acknowledge. 

In the SendAck state the state machine generates a single clock pulse. lss_gpio_e is asserted and a 0 driven 
on bs^gpio_do after Iss^gpio^clk falling edge to issue an acknowledge. 



19.3.3.4 Data transmission 

The LSS master generates a clock pulse for each bit transmitted. Data is output on the LSS bus on the fall- 
ing edge oTlss _gpio_clk 

When the LSS master drives a logical zero on the bus it will assert Iss _gpio_e and drive a 0 on Iss _gpio_do 
after lss_gpiqjclk falling edge. Iss^gpioje will remain asserted and Iss^gpio^do will remain low uuitil the 
next lss_clk falling edge. 

When the LSS master drives a logical one lss,^io_e should be deasserted at lss^^io_clk falling edge and 
remain deasserted at least until the next Iss^io^clk falling edge. This is because the LSS bus will be 
extemally pulled up to logical one via a pull-up resistor. 

In the Sendid byte state, the state machine generates 8 clock pulses to transmit the byte in the IdByte field 
of the current valid command. On each falling edge of Iss _gpioj::lk a bit is driven on the data bus as out- 
lined above. On the first falling edge IdByte[7J is driven on the data bus, on the second falling edge 
IdByte [6] is driven out, etc. 
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la the TransmitByte state, the state machine generates 8 clock pulses to transmit the byte at the output of 
the transmit FIFO. On each falling edge of Iss^iojclk a bit is driven on the data bus as outlined above. 
On the first falling edge UsNBuffer[0][7] is driven on the data bus, on the second falling edge LssNBuf- 
fer[0][6] is driven out, etc on to LssNBufferfO] [7] bits. 

In the WaitForAck state, the state machine generates a single clock pulse. On the rising edge of 
bs_gpiojclk the synchronized gpiojssjii is sampled. A 1 indicates an acknowledge and ack^detect is 
pulsed, a 0 indicates a not-acknowledge and nack_detect is pulsed. 

19.3.3.5 Data rate control 

The CPU can control the data rate by setting the clock period of the LSS bus clock by programming appro- 
priate values in LssClockHighPeriod and UsClockLowPeriod. The defauh setting for both registers is 200 
(pclk cycles) which corresponds to transmission rate of 400kHz on the LSS bus. 
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State machine outputs, Issjcujrq and 
LssStatusSet are zero unless otherwise 
indicated. 



RgafltORpretJi 



ach detect i 



ANDeQunt> 1 



count - 




nack delect 1 
(ss.stanj$.setti}B i 
tes_icujrt| B I 



TransmitByte W 



BdWrEnabfa ^ 1 AND 

RdWfSense == 1 
count sTxRxByteCount 



RdWfEnabIa ==0 




ack detect ssO 




AND count = 1 ^ 






Generate 
Interrupt 



Figure 65. LSS master state machine 
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DRAM Subsystem 
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20 DRAM Interface Unit (DiU) 



20.1 



Overview 



Figure 66 shows how the DIU provides the interface between the on-chip 20 Mbit embedded DRAM and 
the rest of SoPEC. In addition to outlining the functionality of the DIU, this chapter provides a top-level 
overview of the memoiy storage and access patterns of SoPEC and the buffering required in the various 
SoPEC bloclcs to support those access requirements. 

The main functionality of the DIU is to arbitrate between requests for access to the embedded DRAM and 
provide read or write accesses to the requesters. The DIU must also implement the initialisation sequence 
and refresh logic for the embedded DRAM. 

The arbitration mechanism is a hierarchical timeslot mechanism providing guaranteed bandwidth and 
latency to each DIU requester, with unused slots re-allocated to provide best effort accesses. The arbitra- 
tion scheme is fiilly progranunable. 

The interface between the DIU and the SoPEC requesters is similar to the interface on PECl i.e. separate 
contro!, read data and write data busses. 

The embedded DRAM is used principally to store: 

• CPU program code and data. 

• PEP (re)progranmung commands. 

• Compressed pages containing contone, bi-level and raw tag data and header information. 

• Decompressed contone and bi-level data. 

• Dotline store during a print. 

• Print setup infomiation such as tag format structures, dither matrices and dead nozzle information. 
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Figure 66. SoPEC System Top Level partition 
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20.2 IBM Cu-11 Embedded DRAM 

20.2.1 Single bank 

SoPEC will use the 1.5 V core voltage option in IBM's 0.13 |im class Cu-l 1 process. 

. The random read/write cycle time and the refresh cycle time is 3 cycles at 160 MHz [16]. An open page 
access will complete in 1 cycle if the page mode select signal is clocked at 320 MHz or 2 cycles if the page 
mode select signal is clocked every 160 MHz cycle. The page mode select signal will be clocked at 320 
MHz in SoPEC. The DRAM word size is 256 bits. 

Most SoPEC requesters will make single 256 bit DRAM accesses (see Section 20.4). These accesses will 
take 3 cycles as they are random accesses i.e. they will most likely be to a different memory row than the 
previous access. 

The entire 20 Mbit DRAM will be implemented as a single memory bank. In Cu-1 1» the maximum single 
instance size is 16 Mbit. The fiurst 1 Mbit tile of each instance contains an area overhead so the cheapest 
solution in terms of area is to have only 2 instances. 16 Mbit and 4Mbit instances would together consume 
an area of 14.63 miri^ as would 2 times 10 Mbit instances. 4 times 5 Mbit instances would require 17.2 
mm^. 

The instance size will determine the frequency of refresh. Each refresh requires 3 clock cycles. In Cu-1 1 
each row consists of 8 columns of 256-bit words. This means that 16 Mbit requires 8192 rows. A complete 
DRAM refresh is required every 3.2 ms. This would mean a row would have to be refreshed every 62 
cycles. Two times 10 Mbit instances would require a refresh every 100 clock cycles, if the instances are 
refreshed in parallel. Having 4 times 5 Mbit instances means a refresh is required only every 200 cycles. 

The SoPEC DRAM will be constructed as two 10 Mbit instances implemented as a single memory bank. 
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20.3 SoPEC Memory Usage Requirements 

The memory usage requirements for the embedded DRAM are shown in Table 64; 
Table 64. Memory Usage Requirements 



vxtfi^/rBSSQa pBQB SiDie 




Compressed data page store for Bi4evel 
and contone data 


Decompressed Contone 
Store 


106 Kbyte 


1 3824 lines with scale factor 6 = 2304 pixels, 

store 12 lines, 4 colors = 108 kB 

13824 lines with scale factor S = 2765 pixels, 

aiUic i£ lines, *r CUtUlo — IVV r^O 


Spot line store 


5.1 Kbyte 


ioo^4 Qois/itne so tS lines is i kd 


Tag Format Structure 


55 KDyie (3o4 uOi tine lags o 
1600 dpi) 

1 2 Kbyte (2.5 mm tags ^ 800 
dpi) 


90 kd im lor oo*f> uoi line lags 

2.5 mm tags (1/1 0th inch) @ 1600 dpi require 

160 dot lines = 160/384 x55 or 23 KB 

2.5 mm tags O 800 dpi require 80/384 x55 = 

12 kB 


Dither Matrix store 


4 Kbytes 


64x64 dither matrix is 4 kB 
1 28x1 28 dither matrix is 1 6 kB 
256x256 dither matrix is 64 kB 


DNC Dead Nozzle Table 


1.4 Kbytes 


Delta encoded, (10 bit delta position + 6 dead 
nozzle mask) x% Dnozzle 
5% dead rtozzles requires (10+6)x 692 Dnoz- 
zles= 1.4 Kbytes 


Dot-nne store 


319 Kbytes 


Assume each color row is separated by 5 dot 
lines on the print head 
The dot line store will be 0+5+10,.. 50+55 
330 half dot lines + 48 extra half dot lines (4 
per dot row) = 378 half dot lines = SIQKbytes 


PCU Program code 


8 Kbytes 


1024 commands of 64 bits = 8 kB 


CPU 


64 Kbytes 


Program code and data 


TOTAL 


2570 Kbytes (1 2 Kbyte TPS 
storage) 

2613 Kbytes (55 Kbyte TPS) 





Note: 



Total storage of 2570 Kbytes will be reduced to 2560 Kbytes to align to 20 Mbit DRAM. 
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20.4 SoPEC Memory Access Patterns 

Table 65 shows a summary of the blocks on SoPEC requiring access to the embedded DRAM and their 
individual memory access patterns. Most blocks will access the DRAM in single 256-bit accesses. All 
accesses must be padded to 256-btts except for 64-bit CDU write accesses and CPU write accesses. Bits 
which should not be written are masked using the individual DRAM bit write inputs or byte write inputs, 
depending on the foundry. Using single 256-bit accesses means that the buffering required in the SoPEC 
DRAM requesters will be minimized. 



Table 65. Memory access patterns of SoPEC DRAM Requesters 









r*pii 

wr U 


R 


Single 256-bit reads. 


W 


Sinola 32-bit 16-hit or 8-hit writer 


SCB 


W 


wliiyiO A^O^UIl Wlltt79* 


CDU 




j^inftlA ^'^Ivahit raoHft nf tItA M^mnrascoH fv>ntAnA Haf a 
wiiiyio ftsaOo cji iMO (^inpiaaoBO M^niuno aaia. 


w 


Each CDU access is a write to 4 consecutive DRAM words in the same row 
t>ut only 64 bits of each word are written with the remaining bits write 
masked. 

The access time for this 4 word page mode burst is 3 1 + 1 +t = 6 cycles 
if the page mode select signal is docked at 320 MIHz. 


CFU 


R 


Single 256 bit reads. 


UBD 


R 


Single 256 bit reads. 


SFU 


R 


Separate single 256 bit reads for previous and current line but sharing the 
same DIU interface 


W 


Single 256 bit writes. 


TECTD) 


R 


Single 256 bit reads. Each read returns 2 times 128 bit tags. 


Te(TFS) 


R 


Single 256 bit reads. TPS is 136 bytes. This means there is unused data in 
the fifth 256 bit read. A total of 5 reads is required. 


HCU 


R 


Single 256 bit reads. 1 28 x 128 dither matrix requires 4 reads per line with 
double buffering. 256 x 256 dither matrix requires 8 reads at the end of the 
tine with single buffering. 

Dither matrices have start address, end address and line advance incre- 
ment 


DNC 


R 


Single 256 bit dead nozzle table reads. Each dead nozzle table read con- 
tains 16 dead-nozzle tables entries each of 10 delta bits plus 6 dead nozzle 
mask bits. 


DWU 


W 


Single 256 bit writes since enable/disable DRAM access per color plane. 


LLU 


R 


Single 256 bit reads since enable/disable DRAM access per color plane. 


PCU 


R 


Single 256 bit reads. Each PCU command is 64 bits so each 256 bit word 
can contain 4 PCU commands. 

PCU reads from DRAM used for reprogramming PEP shouM be executed 
with minimum latency. 

if this occurs between pages then there win be free bandwkith as most of 
me other SoPEC Units will not be requesting from DRAM. If this occurs 
between bands then the LDB» CDU. and TE bandwidth will be free. So the 
PCU should have a high priority to access to any spare Isandwidth. 


Refresh 




Single refresh. 
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20.5 Buffering Required in SoPEC DRAM Requesters 

If each DIU access is a single 256-bit access then we need to provide a 256-bit double buffer in the DRAM 
requester. If the DRAM requester has a 64-bit interface then this can be implemented as an 8 x 64-bit 
HFO. 



Table 66. Buffer sizes In SoPEC DRAM requesters 



DRAM 
Requester 


Direction 


Access patterns 


Buffering required in 
biock 


CPU 


R 


Single 256-bit reads. 


Cache. 




W 


Single 32-bH writes but allowing 16-bit or byte 

addressable writes. 


None. 


SCB 


W 


Single 256-btt writes. 


Double 

25D-Dn DUITer. 


CDU 


R 


Single 2S6-bit reads of the compressed contone 
data. 


Double 256-bit buffer. 




w 


Each CDU access is a write to 4 consecutive DRAM 
words in the same row but only 64 bits of each word 
are written with the remaining bits write masked. 


Double half JPEG block 
buffer. 


CFU 


R 


Single 256 bit reads. 


Double 256-bit buffer. 






Single 256 bit reads. 


Double 256-bit buffer. 


SFU 


R 


Separate single 256 bit reads for previous and cur- 
rent line but sharing the same DIU interface 


DouWe 256-bit buffer for 
each read channel. 




W 


Single 256 bit writes. 


Double 256-bit buffer. 


TE(TD) 


R 


Single 256 bit reads. 


Double 25&-bit buffer. 


TECTFS) 


R 


Single 256 bit reads. TFS is 136 bytes. This means 
there is unused data in the fifth 256 bit read. A total 
of 5 reads is required. 


Double line-buffer lor 136 
Dytes impiemeniea *n i c 


HCU 


R 


Single 256 bit reads. 128x126 dither maXnx 
requires 4 reads per line with double buffering. 256 x 
256 dither nnatrlx requires 6 reads at the end of the 
line with single buffering. 


Configurable between dou- 
ble 128 byte buffer and 
Single 256 byte buffer. 


DNC 


R 


Single 256 bit reads 


Double 256-bit buffer. 
Deeper buffering could be 
specified to cope with local 
clusters of dead nozzles. 


DWU 


W 


Single 256 bit writes per enabled odd/even color 
plane. 


Double 256-blt buffer per 
color plane. 


.LLU 


R 


Single 256 bit reads per enabled odd/even color 
plane. 


DouWe 256-bit buffer per 
color plane. 


PCU 


R 


Single 256 bit reads. Each PCU command is 64 bits 
so each 256 bit DRAM read can contain 4 PCU conv 
mands. Requested command is read from DRAM 
together with the next 3 contiguous 64-blts which are 
cached to avoid unnecessary DRAM reads. 


Single 256-bit tHiffer. 


Refresh 




Single refresh. 


None. 
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20.6 SoPEC DIU Bandwidth Requirements 

Table 67: SoPEC DIU Bandwidth Requirements 




SCB 



W 



W 



780^ 



128(SF = 4). 288 (SF^ 
6), 1:1 compression^ 



0,328 



0-328 



0.5 



CDU 



W 



32/n2 (SFc=n). 
0.9 (SF = 6), 
2(SF = 4) 
(1:1 compression) 



For individual accesses: 
lecydes (SF = 4),36 
cydes (SF = 6), n^cydes 
<SF=n). 

Will be implemented as a 
page mode burst of 4 
accesses every 64 cycles 
(SF = 4), 144(SF5^). 
4*n2 (SF =n) cydes* 



64/0^ (SF=n), 
1.8 (SF = 6). 
4 (SF = 4) 



32/1 0-n2 (SF=n). 
0.09 (SF = 6), 
0.2 (SF = 4) 
(10:1 compression)^ 



1 (SF=r6) 

2(SF^) 



32/n2 (SF=n) 
0.9 (SF = 6) 
2 (SF = 4)^ 



2 (SF=6) 
4 (SF=4) 



CFU 



32 (SF = 4), 48 (SF = 6f 



32/n (SF=n). 
5.4 (SF = 6), 

8 (SF = 4) 



32/n (SF=n). 
5.4 (SF = 6), 
8 (SF = 4) 



5.5 (SF=6) 
8(SF*4) 



LBD 



256 (1:1 compression)® 



1 (1:1 compression) 



0.1 (10:1 compression)" 



SFU 



W 



128' 



256^ 



TE(TD) 



252 



12 



1.02 



1.02 



1.25 



TECTFS) 



5 reads per line^ 



0.093 



0.093 



0.25 



HCU 



4 reads per line for 1 28 x 
128 dither matrix'* 



0.074 



0.074 



0.25 



DNC 



106 (5% dead-nozzles 
10-bit delta encoded)^* 



2.4 (dump of dead 
nozzles) 



0.8 (equally spaced 
dead nozzles) 



2.5 



DWU 



W 



6 writes every 256^® 



LLU 



8 reads every 256 



PCU 



256^® 



Refresh 



100^» 



2.56 



2.56 



2.75 



TOTAL 



SF«6:34 
SF»4:39.5 
excluding CPU 



SF = 6: 27.5 
SF»4:31.2 
exdudihg CPU 



SF=:6:35 
exdudtng CPU. 
SF» 4: 40.5 
exduding CPU 



Notes: 

1 : The number of allocated timeslots is based on 64 timeslots each of 1 bit/cycle but broken down to a granularity of 
0,25 bit/cycle. 

2: 50 Mbic/s is 0.328 bits/cycle or 256 bits every 780 cycles. 

3: At 1 : 1 compression CDU must read a 4 color pixel (32 bits) every SF^ cycles. 
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4: At I0:l'avcragc compression CDU must read a 4 color pixel (32 bits) every 1 O^SF^ cycles. 
5: 4 color pixel (32 bits) is required, on average, by the CFU every SF^ (scale fector) cycles. 

The time available to write the data is a function of the size of the buffer in DRAM. 1.5 buffcfing means 4 color pixel 
(32 bits) must be written every SF^ / 2 (scale fector) cycles. Therefore, at a scale factor of SF, 64 bits are required 
every SF^ cycles- 

Since 64 valid bits are written per 256-bit write (Figure 104 on page 282) then the DRAM is accessed every SF 
cycles i.c. at SF4 an access every 16 cycles, at SF6 an access every 36 cycles. 

If a page mode burst of 4 accesses is used then each access takes (3 + I + 1 +1) equals 6 cycles. This means at SF, a set 

of 4 back.to-back accesses must occur every 4*SF2 cycles. This assumes the page mode select signal is clocked at 320 

MHz. CDU timcslots therefore take 6 cycles. 

For scale factors lower than 4 double buffering will be used. 

6: The average bandwidth 1/2 the peak bandwidth in the case of 1 .5 buffering. 

7: 4 color pixel (32 bits) read by CFU every SF cycles. At SF4. 32 bits is required every 4 cycles or 256 bits every 32 

cycles. At SF6. 32bits every 6 cycles or 256 bits every 48 cycles. 

8: At 1 :1 compression require 1 bit/cycle or 256 bits every 256 cycles. 

9: The average bandwidth required at 10:1 compression is 0.1 bits/cycle. 

10: Two separate reads of I bit/cycle. 

U: Write at 1 bit/cycle. 

12: Each tag can be consumed in at most 126 dot cycles and requires 128 bits. This is a maximum rate of 256 bits 
every 252 cycles. 

13; 17 X 64 bit reads per line in FECI is 5 x 256 bit reads per line in SoPEC. Double-line buffered storage. 

14- 128 bytes read per line is 4 x 256 bit reads per line. Double-line buffered storage. 

15- 5% dead nozzles 10-bit delta encoded stored with 6-bit dead nozzle mask requires 0.8 bits/cycle read access or a 
. 256-bit access every 320 cycles. This assumes the dead nozzles are evenly spaced out. In practice dead nozzles aie 

likely to be clumped. Peak bandwidth is estimated as 3 times average bandwidth. 
16: 6 bits/cycle requires 6 x 256 bit writes every 256 cycles. 

17- 6 bits/160 MHz SoPEC cycle average but will peak at 2 x 6 bits per 106 MHz print head cycle or 8 bits/ SoPEC 
cycle. The PHI can equalise the DRAM access rate over the line so that the peak rate equals the average rate of 8 bits/ 
cycle. 

1 8- Assume one 256 read per 256 cycles is sufficient i.e. maximum latency of 256 cycles per access is allowable. 

19- As an example assume refresh must occur every 3,2 ms. Refresh occurs row at a time over 5120 rows of 2 parallel 
10 Nfbit instances. Each refresh takes 3 cycles. This is equivalent to a timeslot every 100 cycles. 
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20.7 DIU BUS TOPOLOGY 
20.7.1 Basic topology 



Table 68. SoPEC DIU Requesters 







CPU 


CPU 


Refresh 


CDU 


SCB 




CFU 


CDU 




LBD 


SFU 




SFU 


DWU 




TEfTD) 






TE(TFS) 






HCU 






DNC 






LLU 






PCU 







Table 68 shows the DIU requesters in SoPEC. There are 1 1 read requesters and 5 write requesters in 
SoPEC as compared with 8 read requesters and 4 write requesters in FECI. Refresh is an additional 
requester. 

In PEC 1, the interface between the DIU and the DIU requesters had the following main features: 

• separate control and address signals per DIU requester multiplexed in the DIU according to the arbitra- 
tion scheme, 

• separate 64-bit write data bus for each DRAM write requester multiplexed in the DIU. 

• common 64-bit read bus from the DIU with separate enables to each DIU read requester. 

Timing closure for this bussing scheme was straight-forward in PECl. This suggests that a similar scheme 
will also achieve timing closure in SoPEC. SoPEC has 5 more DRAM requesters but it will be in a 0. 1 3 
um process with more metal layers and SoPEC will run at approximately the same speed as PECl. 

Using 256-bit busses would match the data width of the embedded DRAM but such large busses may 
result in an increase in size of the DIU and the entire SoPEC chip. The SoPEC requestors would require 
double 256-bit wide buffers to match the 256-bit busses. These buffers, which must be implemented in 
flip-fiops, are less area efficient than S-deep 64-bit wide register arrays which can be used with 64-bit bus- 
ses. SoPEC will therefore use 64-bit data busses. Use of 256-bit busses would however simplify the DIU 
implementation as local buffering of 256-bit DRAM data would not be required within the DIU. 

20.7,1.1 CPU DRAM access 

The CPU is the only DIU requestor for which access latency is critical. All DIU write requesters transfer 
write data to the DIU using separate point-to-point busses. The CPU will use the cpu_dataout[3 1:0] bus. 
CPU reads will not be over the shared 64-bit read bus. Instead, CPU reads will use a separate 256-bit read 
bus. 

20.7.2 Making more efficient use of DRAM bandwidth 

The embedded DRAM is 256-bits wide. The 4 cycles it takes to transfer the 256-bits over the 64-bit data 
busses of SoPEC means that effectively each.access will be at least 4 cycles long. It takes only 3 cycles to 
actually do a 256-bit random DRAM access in the case of IBM DRAM. 
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20.7.2.1 Common read bus 

If wc have a common read data bus, as in FECI, then if we are doing back to back read accesses the next 
DRAM read cannot start until the read data bus is free. So each DRAM read access can occur only every 4 
cycles. This is sho>vn in Figure 67 with the actual DRAM access taking 3 cycles leaving 1 unused cycle 
per access. 



pclkl 
diu_data[63:0] [ 

rreq(n+l) 

rreq(n+2) 

rreq{n+3) " 
rack(n+l) 

rack(n+2) 

rack(n+3) 



access n 



access n+l 



access n+2 



unused 
cycle 



unused 
cycle 
— ► 



access 



unused 
cycle 
► 



J — L 



Figure 67. Shared read bus with 3 cycle random DRAl^A read accesses 



20.7.2.2 Interleaving CPU and non^CPU read accesses 

The CPU has a separate 256-bit read bus. All other read accesses are 256-bit accesses are over a shared 64- 
bit read bus. Interleaving CPU and non-CPU read accesses means the effective duration of an interleaved 
access timeslot is the DRAM access time (3 cycles) rather than 4 cycles. Interleaving is achieved by order- 
ing the DIU arbitration slot allocation appropriately. 
Figure 68 shows interleaved CPU and non-CPU read accesses. 
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I I I I 

Figure 68. Interleaving CPU and non-CPU read accesses 



20.7.2.3 interleaving read and write accesses 

Having separate write data busses means write accesses can be interleaved with each other and with read 
accesses. So now the effective duration of an interleaved access timeslot is the DRAM access time (3 
cycles) rather than 4 cycles. Interleaving is achieved by ordering the DIU arbitration slot allocation appro- 
priately. 

Figure 69 shows interleaved read and write accesses. Figure 70 shows interleaved write accesses. 

>«'*iaj~Ljn_j~Lri_n_j~LJija4aj"^^ 



I Read access I Write access I Read access I Write access 




' Figure 69. Interleaving read and write accesses with 3 cycle random DRAM accesses 

Write data still takes 4 cycles to transmit over 64-bit busses so 256-bit buffers are required in the DIU to 
gather the write data from the requesters. 
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1 I I 




Figure 70. Interieaving write accesses with 3 cycle random DRAIVI accesses 
20.7.3 Buarmidths y 



Table 69. SoPEC DIU Requesters Data Bus Width 



CPU 


256 (separate) 






CDU 


64 (shared) 


SCB 


64 


CFU 


64 (shared) 


CDU 


64 


LBD 


64 (shared) 


SFU 


64 


SFU 


64 (shared) 


DWU 


64 


TE(TD) 


64 (shared) 






TE(TFS) 


64 (shared) 






HCU 


64 (shared) 






DNC 


64 (shared) 






LLU 


64 (shared) 






PCU 


64 (shared) 







20.7,4 Conclusions 

Reads and writes can be interleaved with a separate 256-bit read bus for the CPU for minimum latency 
DIU access. Interleaving can be performed by inserting write accesses or CPU accesses between shared 
read bus accesses. The interleaving is achieved by ordering the DIU arbitration slot allocation appropri- 
ately. 
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20.8 



SoPEC DRAM ADDRESSING SCHEME 



The embedded DRAM is composed of 256-bit words. However the CPU-subsystem may need to write 
individual bytes of DRAM. Therefore it was decided to make the DIU byte addressable. 22 bits are 
required to byte address 20 Mbit of DRAM. 

Most blocks read or write 256 bit words of DRAM. Therefore only the top 17 bits i.e. bits 21 to 5 are 
required to address 256-bit word aligned locations. 

The exceptions are 

• CDU which can write 64-bits so only the top 19 address bits i.e. bits 21 -3 are required. 

• CPU writes can be 8, 1 6 or 32-bits. The cpu_diujMnaskfI:OJ pins indicate whether to write 8, 16 or 32 
bits. 

All DIU accesses must be within the same 256-bit aligned DRAM word. 



Doc: SoPEG_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 207 



SoPEC : Hardware Design 



20.9 DIU Protocols 

The DIU protocols are 

• pipelined i.e the following transaction is initiated while the previous transfer is in progress. 

• split transaction i.e. the transaction is split into independent address and data transfers. 

20.9.1 Read Protocol except CPU 

The SoPEC read requestors, except for the CPU, perform single 256-bit read accesses with the read data 
being transferred from the DIU in 4 consecutive cycles over a shared 64-bit read bus. diu_data[63:0J. The 
read address <unit>Jtiuj'adr[2l:5] is 256-bit aligned. 

The read protocol is: 

• <unit>jiiu_rreq is asserted along with a valid <unit>jdiu_radr[21 :5], 

• The DIU acknowledges the request with diu_<unit>_rack The request should be deasserted. The min- 
imum number of cycles between <unii>Jliu_rreq being asserted and flie DIU generating an 
diu^<unit>_rack strobe is 2 cycles (1 cycle to register the request, 1 cycle to perform the arbitration - 
see Section 20.13.6). 

• The read data is returned on diujtata[63:0] and its validity is indicated by diu_<unit>^rvalid, 

• When four diu_<unU>_rvalid pulses have been received then if there is a further request 
<unit>Jiiu^rreq should be asserted again. diu_<unit>_rvalid will be always be asserted by the DIU 
for four consecrative cycles. The first diu^<unit>_rvalid pulse will occur 3 cycles after 
diu_<unit>_rack (1 cycle to transfer the address to the DRAM, 2 cycles for the read data to be 
returned from the DRAM). 



pclk 

<unit>_diu_rreq 
di u__<unit>_rack 



<unit>_diu_radr[21:5] 
diu_<unit>_rval id 
diu_data[63:0] [ 



Figure 71. Read protocol for a SoPEC Unit making a single 256-bit access 



20.9.2 Read Protocol for CPU 

The CPU performs single 256-bit read accesses with the read data being transferred from the DIU over a 
dedicated 256-bit read bus for DRAM data, dram_cpujtata[255:0]. The read address cpu_adr[21:5] is 
256-bit aligned. 
The CPU DIU read protocol is: 

• cpujiiu_rreq is asserted along with a valid cpu_adr[21 :5J. 
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• The DIU acknowledges the request with diujcpu^rack. The request should be deasserted. The mini- 
mum number of cycles between cpujdiu_rreq being asserted and the DIU generating a cpu^diu^rack 
strobe is 2 cycles (1 cycle to register the request, 1 cycle to perform the arbitration - see Section 
20.13.6). 

• The read data is returned on dram j:pujiata[25 5:0] and its validity is indicated by diujcpu^rvalid. 

• When the diu_cpu_rvalid pulse has been received then if there is a further request cpu_diu_rreq should 
be asserted again. The diujcpu^rvalid pulse will occur 3 cycles after rack (1 cycle to transfer the 
address to the DRAM, 2 cycles for the read data to be returned from the DRAM). 



pclk 

cpu_diu_rreq 
diu_cpu_rack 




cpu_adr(2l:5] | . | 7~ 



diu_cpu_rvalid 



dram_cpu_data[25 5:0] 



Figure 72. Read protocol for a CPU making a single 256-foit access 



20.9.3 Write Protocol except CPU and CDU 

The SoPEC write requestors, except for the CPU and CDU, perform single 256-bit write accesses with the 
write data being transferred to the DIU in 4 consecrative cycles, over dedicated point-to-point 64-bit write 
data busses. The write address <unit>_diu_wadr[2I:5] is 256-bit aligned. 

The write protocol is: 

• <unit>Jtiu_wreq is asserted along with a valid <unit>_diu_wadr[2}:5]. 

• The DIU acknowledges the request with diu_<unit>_wack. The request should be deasserted. The 
minimum number of cycles between <unu>_diu_wreq being asserted and the DIU generating an 
diu_<unit>_wack strobe is 2 cycles (1 cycle to register the request, 1 cycle to perform the arbitration - 
see Section 20.13.6). 

• In the clock cycles following wack the SoPEC Unit outputs the <unit>_diu_dataf63:0J, asserting 
<unit>_diujiwaiid. Write data should be output as soon as possible after receiving the wack Access- 
ing registers, register arrays or SRAMs may incur different delays. The first <unit>_diu_wvalid pulse 
can occur in the clock cycle after diu_<unit>_wack. In the case of register array or SRAM access, the 
first <unit>_diu_wvalid pulse will occur 2 clock cycles after diu_<unit>_wach 

• Once all the write data has been output then if there is a further request <unit>_diu_wreq should be 
asserted again. 

A timeout mechanism will be implemented to ensure that the DIU will not lock-up if four 
<unit>_diu_wvalid pulses are not provided. 
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pclk 

<unit>.diu_wreq 
<umt>_diu_wadr[2 1 :5] 
diu_<unit>_wack 



<unit>_diu_data[63:0] 
<unit>_diu_wvalid 



3T^ 



Figure 73. Write Protocol shown for a SoPEC Unit making a single 256-bit access 



20,9.4 CPU Write Protocol 



The CPU perfonns single write which can be 8, 16 or 32-bits with the write data being transferred to the 
DIU over the cpu^dataout[3!:0] bus. The write address cpu_adr[2I:0J is byte aligned. 

'The CPU write protocol is: 

• cpujdiu_wreq is asserted along with a valid cpujadr[21:0] and a write mask cpujiiuj^mask[l:0] to 
indicate whether an 8, 16 or 32'bit access is required. 

• The DIU acknowledges the request with diu_cpuj^ack The request should be deasserted. The mini- 
mum number of cycles between cpu_diu_wreq being asserted and the DIU generating an 
diu_cpu_wack strobe is 2 cycles (1 cycle to register the request, 1 cycle to perform the arbitration - see 
Section 20.13.6). 

• In the clock cycle following diu_cpu_wack the CPU outputs the cpu^dataout [3 1 :0] , asserting 
cpu_diu_wvalid. Write data should be output as soon as possible after receiving the diu^cpu^wack. 
The earliest the cpu_diu_wvalid pulse can occur is in the first clock cycle after diu_cpu_wack, 

• Once the write data has been output then if there is a further request cpujdiu^wreq should be asserted 
agaia 
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pclk 

cpu_diu_wreq 
cpu_a<lr[21:0] 



q)u_diu_wvalid 



J 



q)u_diu_winask[l:0] j. 
diu_cpu_wack 

cpu_dataout[31:0] 



J — L 



Figure 74. Write Protocol shown for a CPU making an 8, 16 or 32-bit access 



20.9.5 COU Write Protocol 



The CPU performs four 64«bit writes to 4 contiguous 256-bit DRAM addresses with the first address spec- 
ified by cdu_diu_wadr[21:3]. The write address cdu_diu_wadr[21:3] is 64-bit aligned 
The write protocol is: 

• cdu_diu_wdata is asserted along with a valid cdu_diu_wadr[2J:3]. 

• The Dili acknowledges the request with diu_cdu_wach The request should be deasserted. The mini- 
mum number of cycles between cdu_diu_wreq being asserted and the DIU generating an 
diu_cdu_wack strobe is 2 cycles (1 cycle to register the request, 1 cycle to perform the arbitration - see 
Section 20,13.6). 

• In the clock cycles following wack the CDU outputs the cdu_diu_data[63:0] , together with asserted 
cdu^diu_wvalid. Write data should be output as soon as possible after receiving the wack. Accessing 
registers, register arrays or SRAMs may incur different delays. The first cdu_diu_wvalid pulse can 
occur in the clock cycle af^er diu_cdu_wack. In the case of register array or SRAM access, the fiist 
cdu_diu_wvalid pulse will occur 2 clock cycles after diu_cdu_wack, 

• Once all the write data has been output then if there is a further request cdu_diu_wreq should be 
asserted again 

A timeout mechanism will be implemented to ensure that the DIU will not lock-up if four cpu_diu_wvalid 
pulses are not provided. " 
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pclk 

cdu.diu^wreq 



1 



cdu.diu_wadr[22:31 I rl I 

diu_cdu_wack | [ 

cdu_dm.data[63:0] 
cdu_diu_wvalid 



1112 13 14 



Figure 75. Write Protocol shown for CDU making four contiguous 64-bit accesses 
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20.10 DiU ARBITRATION MECHANISM 

The DIU will arbitrate access to the embedded DRAM. The arbitration scheme is outlined in the next sec- 
tions. 

20.10.1 Timesfot based arbitration scheme 

Table 67 summarised the bandwidth requirements of the SoPEC requestors to DRAM. If we allocate the 
DIU requestors in terms of peak bandwidth then we require 36 bits/cycle (at SF and 42.5 bits/cycle (at 
SF = 4) for all the requestors except the CPU. 

A timeslot scheme is defined with 64 main timeslots. The number of used main timeslots is programmable 
between 0 and 64. 

Since DRAM read requestors, except for the CPU. are connected to the DIU via a 64-bit data bus each 
256.bit DRAM access requires 4 pclk cycles to transfer the read data over the shared read bus. The 
timeslot rotation period for 64 timeslots each of 4 pclk cycles is 256 pclk cycles or 1 .6 ps, assuming pclk is 
160 MHz. Each timeslot represents a 256-bit access every 256 pclk cycles or 1 bit/cycle. This is the granu- 
larity of the majority of DIU requestors bandwidth requirements in Table 67. 

The SoPEC DIU requesters can be represented using 5 bits (Table on page 229). Using 64 timeslots 
means thai to allocate each timeslot to a requester a total of 64 times 5 configuration registers is required 
for the 64 main timeslots. 

Timeslot based arbitration works by having a pointer point to the current timeslot. When re-arbitration. is 
signaled the aibitration pointer will advance to the next timeslot. If the SoPEC Unit assigned to the current 
timeslot is not requesting then the unused timeslot arbitration mechanism outlined in Section 20.10.4 is 
used to select the arbitration winner. 

The timeslot pointer advances when the DIU issues the next command to the DRAM. Each timeslot there-: 
fore denotes a single access. The duration of the timeslot depends on the access. 

If the SoPEC Unit pointed to by the current timeslot pointer is not requesting then the slot will be allocated 
according to the mechanism described in Section 20.10.5. 



current timeslot 
pointer 







n-l 


n 


n+l 









Figure 76. Timeslot based arbitration 

20.10.2 Separate read and write arbitration windows 

For write accesses, except the CPU. 256-bits of write data are transfeired from the SoPEC DIU write 
requestors over 64-bit write busses in 4 clock cycles. This write data transfer latency means that writes 
accesses, except for CPU writes, must be arbitrated 4 cycles in advance. The [to be included figure and 
CKpianation] shows why this is necessary. 
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Since write arbitration must occur 4 cycles in advance, and the minimum duration of a timeslot duration is 
3 cycles, the arbitration rules must be modified to initiate write accesses in advance.accordingly. There is 
a timeslot loolcahead pointer shown in Figure 77 two timcslots in advance of the current timeslot pointer. . 



current timeslot 
pointer 



n+1 



timeslot lookahead 
pointer 



n+2 



Figure 77, Timeslot based arbitration with separate read and write pointers 
The following exan^les illustrate separate read and write timeslot arbitration. 
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Programmed timeslot order 



W 
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R 
W 



W 



w 



w 



Timeslot arbitration order 



Actual timeslot order 



write 
latency 



Figure 78. Example (a), separate read and write arbitration 

In Figure 78 writes arc arbitrated two timeslots in advance. Reads are arbitrated in the same cycle Writes 
can be arbitrated in the same cycle as a read. During arbitration the command address of the arbitrated 
SoPEC Unit is captured. 

Other examples are shown in Figure 79, Figure 80 and Figure 81. The actual timelsot order is always the 
same as the progranuned timeslot order i.e. out of order accesses do not occur and data coherency is never 
an issue. 

Each write must always incur a latency of two timeslots. If the first write occurs in the first timeslot then 
all following timeslots will incur a latency of two timeslots. This is shown in Figure 78 and Figure 79. If 
the first write occurs in the second timeslots then all following timeslots will incur a latency of two 
timeslots. This is shown in Figure 80. 
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w 



Programmed timeslot oreder 



W 
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R 



Timeslot arbitration order 
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R 
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write 
latency 



Actual timeslot order 



Figure 79. Example (b), separate read and write arbitration 
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Figure 80. Example (c), separate read and winite arbftratlon 
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Figure 81. Example (d), separate read and wnte arbitration 



Table 70 shows the 4 scenarios depending on whether the current timesiot and timesiot lookahead pointers 
point to read or write accesses. 

To be checked and updated: 



Table 70: Arbitration with separate windows for read and write accesses 




read 



readi 



write 1 



write 



write 



read2 



write2 



read 



Initiate read transfer. 



Initiate write transfer. 



Initiate readi transfer. 



Initiate write2 transfer. 



No action. 



If the current timesiot pointer points to a read access then this will be initiated immediately. 

If the timesiot lookahead pointer points to a write access then this access is initiated immediately, or 
immediately after the read access associated with the current timesiot pointer is initiated. 

When a write access is initiated the DIU will capture the write address and will do the DRAM write two 
tiemslots in advance when the associated write data has been transfered to the DIU. 

To be checked and updated: At initialisation, both pointers point to the first timesiot. The lookahead 
pointer advances to the second timesiot and the third timesiot in successive clock cycles until it is two 
timeslots ahead of the current timesiot pointer. Then both pointers advance in tandem. At each step, the 
rules in Table 70 are obeyed. This leads to the behaviour shown in the exampes of Figure 78 to Figure 81 . 

CPU write accesses are excepted from the lookahead mechanism. 

Timing diagrams for these scenarios are shown in Section 20.13 Implementation. 
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If the selected SoPEC Unit is not requesting then there will be separate read and write selecHon for unused 
timeslots. This is described in Section 20.10.5. 

20. 1 0.3 Arbitration of CPU accesses 

The CPU can be allocated timeslots like any other DIU requestor. If CPU accesses are interleaved between 
the shared read bus accesses then the DIU timeslots will take 3 cycles as shown in Section 20.7.2.2. The 
timeslot rotation will be £Eister than 256 pclk cycles. 

What distinguishes the CPU from other SoPEC requestors, is that the CPU requires minimum latency 
DRAM access i.e. preferably the CPU should gel the next available timeslot whenever it requests. 

The minimum CPU read access latency is estimated in Table 7 1 . This is the time between the CPU making 
a request to the DIU and receiving the read data back from the DIU. This ignores any latency associated 
with the CPU's caching mechanism. 

Table 71. Estimated CPU read access latency ignoring caching 







register the CPU read 
request 


1 cycle 


complete the arbitra- 
tion of the request 


1 cycle 


transfer the read 
address to the DRAM 


1 cycle 


DRAM read latency 


2 cycles 


register the read data 


1 cycle 


TOTAL 


6 cycles 



If the CPU, as is likely, requests DRAM access again immediately after receiving data from the DIU then 
the CPU can access every second timeslot. This assumes that interleaving is employed so that timeslots 
last 3 cycles. If the CPU access latency increases to 7 cycles then the CPU will only be able to access every 
third timeslot. 

If a cache hit occurs the CPU does not require DRAM access. For its next DIU access it will have to wait 
for its next assigned DIU slot. Cache hits therefore will reduce the number of DRAM accesses but not 
speed up any of those accesses. 

To avoid the CPU having to wait for its next timeslot it is desirable to have a mechanism for ensuring that 
the CPU always gets the next available timeslot without incurring any latency on the non-CPU timeslots. 

This can be done by defining each timeslot as consisting of a CPU access preceding a non-CPU access. 
Each timeslot will last 6 cycles i.e. a CPU access of 3 cycles and a non-CPU access of 3 cycles. This is 
exactly the interleaving behaviour outlined in Section 20.7.2.2. If the CPU does not require an access, the 
timeslot will take 3 or 4 and the timeslot rotation will go faster. A simimary is given in Table 72. 



Table 72. Timeslot access times. 







CPU access + non-CPU access 


3 + 3 = 6 cycles 


Interleaved access 


non-CPU access 


4 cycles 


Access and preceding access both to shared 
read bus 
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Table 72. TImestot access times. 









non-CPU access 


3 cycles 


Access and preceding access not both to shared 
read bus 


CDU write access 


3+1+1+1 =6 cycles 


Page mode select signal is clocked at 320 MHz 



CDU write accesses require 6 cycles. CDU write accesses preceded by a CPU access require 9 cycles. 
CDU timeslots therefore take longer than all other DIU requestors timeslots. 

With a 256 cycle rotation there can be 42 accesses of 6 cycles. This is just enough timeslots for SF = 4 
operation, ignoring implementaiion pipeline latencies. 

For low scale factor applications, it is desirable to have more timeslots available in the same 256 cycle 
rotation. So two counters of 4-bits each are defined allowing the CPU to get a maximum of cpu_timeslots 
in total jtimeslots, A timeslot counter starts at total^timeslots and decrements every timeslot, while another 
counterstaits at cpujtimeslots and decrements every timeslot in which the CPU uses its access. When the 
CPU timeslot counter goes to zero before total^timeslots no further CPU accesses are allowed. When the 
total^timeslots counter reaches zero both counters arc reset to their respective initial values. 
When cpujtimeslots is set to zero then no accesses will be preceded by CPU accesses. The CPU can be 
allocated timeslots like any other DIU requestor. 

If CPU accesses are interleaved between the shared read bus accesses then the DIU timeslots will take 3 
cycles as shown in Section 20.7.2.2 Otherwise the timeslots will take 4 cycles each and the rotation will 
take 256 cycles. 

The various modes of operation are summarised in Table 73 with a nominal rotation period of 256 cycles- 



Table 73. CPU timeslot allocation modes with nominal rotation period of 256 cycles 









teiytliifteiiiMl 


CPU Pre-access 

i,e. cpujtimeslots = total_jimeslots 


6 cycles 


42 timeslots 


Each access is CPU + non-CPU. 

Iff CPU does not use a timeslot then rotation is taster. 


Fractional CPU 
Pre-access 

i.e. cpujtimeslots < totaljtimeslots 


4 or 6 cycles 


42-64 timeslots 


Each CPU + non-CPU access requires a 6 cycle 
timeslot. 


Individual non-CPU timeslots take 4 cycles if 
current access and preceding access are both 
to shared read bus. 


Individual non-CPU timeslots take 3 cycles if 
current access and preceding access are both 
to shared read bus. 


Interleaved 

Ke. cpujtimeslots - 0 


4 cycles 


64 timeslots 


Timeslot rotation is faster by 1 cycle for each 
CPU, write access or interleaved read access 



20.10:4 Sub-timeslots 

Looking at the bandwidth requirements of the DIU requesters in Table 67, most DIU requesters require 
bandwidths of I bit/cycle or multiples thereof However, some of the requestors require much lower band- 
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width. This suggests that some sub-timeslots of lower granularity than a nominal 1 bit/cycle should be 
defined. 

There will be 2 sub-timeslots of 4 and 8 slots each. The bandwidth associated with each individual sub-slot 
is nominally 0.25 and 0.125 bits/cycle respectively, assuming each slot last 4 cycles. Sub-timeslots can be 
allocated to any number of main timeslots so that any multiple of the individual sub-timeslot bandwidth 
can be obtained. 

Table 74. Sub-tlmeslot definition 




Each sub-slot pointer gets advanced each time it is accessed regardless if it slot is used or not. 

Sub-timeslots are similar in all other ways to main timeslots i.e. 

• they can have preceding CPU accesses in a similar manner. 

• unused slots are decided by the same unused timeslot allocation mechanism (Section 20. 10.5). 

• a timeslot lookahcad pointer is used to select writes (except for CPU writes) early to compensate for 
write data transfer latency. 



current timeslot 
pointer 



► 


m 






n 




n+2 








P 





















sub4timeslot 



sub3 timeslot 



Figure 82. Example sub-timeslot allocation 

An example sub-timeslot allocation is shown in Figure 82. 

Every time main timeslots m and n are accessed, the SoPEC unit pointed to by the pointer in sub4timeslot 
will win arbitration and the sub4timeslot pointer will advance. Similarly, every time main timeslots n+2 
and/? are accessed, the SoPEC unit pointed to by the pointer in subStimeslot will win arbitration and the 
sub3 timeslot pointer will advance. 

20.10.5 Allocating unused timeslots 

Unused slots are re-allocated on a two-level round-robin basis. This is best-effort traffic. 
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Each SoPEC requestor has two associated bits, RoundRobinLevel indicates whether it is in level 1 or level 
2 round-robin» and RoundRohinEnable indicates whether it is enabled or not in the selected round-robin. 



Table 75. Round-robtn selection 









RoundRobinLevel = 0 


RoundRobinEnable - 0 


Not enabled 




RoundRobinEnable « 1 


Level 1 


RoundRobinLevel - 1 


RoundRobinEnable «= 0 


Not enabled 




RoundRobinEnable = I 


Level! 



Separate read and write roimd-robin trees are needed, one for read accesses and one for write accesses. 

CDU write accesses cannot be included in the round-robin allocation for write as CDU accesses take 6 
cycles. The write accesses which the CDU write could otherwise replace require only 3 or 4 cycles. 

Robin-robin allocations do not have CPU pre-accesses. 

A pointer points to the current allocated unit in each of the round-robin levels. If the unit pointed to the 
level 1 roimd-robin is requesting then this unit wins the arbitration and the pointer is advanced. If the unit 
pointed to in the level 1 round-robin is not requesting then the next units in the level 1 round-robin are 
examined. When a requesting unit is found this unit wins the arbitration and the pointer advances to the 
next unit. If no unit is requesting then the pointer does not advance and the second level of round-robin is 
examined in the same way as first level of the round-robin. 



Table 76, Write round-robin registers bit order 




CPU(W> 0 



SCB 1 

SFU(W) 2 

DWU 3 



20.1 0«6 Background refresh controller 

A background refresh controller should be implemented that will issue a refresh and pause the timeslot 
rotator in case data is about to be lost. This scenario will only occur in the situation that insufficient 
timeslots were allocated for refresh. 
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20.11 Guidelines for programming the DIU 

Some guidelines for programming the DIU arbitration scheme are given in this section together with an 
example. 

20.11.1 Implementation pipeline latencies 

The number of allocated timeslots for each requester needs to take into account implementation pipeline 
latencies. The number of timeslots is made programmable. This means 1 or 2 timeslots can be removed to 
allow for implementation latency. Each timeslot will allow for 6 cycles implementation latency in CPU 
Pre-access mode and 3 cycles otherwise for each single timeslot allocation in a rotation.. If units are allo- 
cated more than 1 timeslot in a rotation then the gwp between slots may need to be reduced additionally to 
allow for implementation latency. 

20.11.2 Ensuring sufficient DNC and PCU access 

PCU command reads from DRAM are exceptional events and should complete in as short a time as possi- 
ble. Similarly, we must ensure there is sufficient free bandwidth for DNC accesses e.g. when clusters of 
dead nozzles occur. In Table 67 DNC is allocated 3 times average bandwidth. PCU and DNC can also be 
allocated to the level 1 round-robin allocation for unused timeslots so that unused timeslot bandwidth is 
available to them. 



20.1 1 .3 Basing timeslot afiocatron on peak bandwidths 

Since the embedded DRAM provides sufficient bandwidth to use 1:1 compression rates for the CDU and. 
LBD, it is possible to simplify the main timeslot and sub-timeslot allocation by basing the. allocation on 
peak bandwidths. The only variable in determining timeslot allocations then becomes the scale factor. 

If slot allocation is based on peak bandwidth requirements then DRAM access will be guaranteed to all 
SoPEC requesters. If we do not allocate slots for peak bandwidth requirements then we can also allow for 
the peaks deterministicalfy by adding some cycles to the print line time. 

20.11.4 Adjacent timeslot limitations 

All DIU requesters have state-machines which request and transfer the read or write data before requesting 
again. The time to perform this operation is greater than the time between adjacent timeslots. Therefore 
adjacent timeslots should not be assigned to a particular DIU requester because the requester will not be 
able to make use of all these slots. 

20.1 1.5 Line margin 

The SFU must output 1 bit/cycle to the HCU. Since HCUNumDots may not be a multiple of 256 bits the 
last 256-bit DRAM word on the line can contain extra zeros. In this case, the SFU may not be able to pro- 
vide 1 bit/cycle to the HCU. This could lead to a stall by the SFU. This stall could then propagate if the 
margins being used by the HCU are not sufficient to hide it. The maximum stall can be estimated by the 
calculation: DRAM service period - X scale factor ♦ dots used from last DRAM read for HCU line. 

Similarly, if the line length is not a multiple of 256-bits then e.g. the LLU could read data from DRAM 
which contains padded zeros. This could lead to a stall. This stall could then propagate if the page margins 
cannot hide it. 

A single addition of 256 cycles to the line length will suffice for all DIU requesters to mask these stalls. 
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20.1 1 .6 Example DIU programming 

A fiill exan^le to be worked out. 

Program Affli/i7«way/or and SubnTimesht configuration registers (Table 82) for peak required bandwidths 
of SoPEC Units acconling to the scale factor used for the document. 

Program unused slots to use the round-robin allocation to share unused slots between all DIU requesters. 
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20.12 CPU DRAM ACCESS PERFORMANCE 

This section does not yet reflect any implementation pipeline latencies. 

The CPU's share of the timeslots can be specified in terms of guaranteed bandwidth and average band- 
width allocations. 

The CPU's access rate to memoiy depends on 

• the CPU read access latency i.e. the time between the CPU making a request to the DIU and receiving 
the read data back from the DIU. 

• how often it can get access to DIU timeslots. 

Table 71 estimated the CPU read latency ignoring caching as 6 cycles. 

How often the CPU can get access to DIU timeslots depends on the access type. 



Table 77. CPU DRAM access performance 









CPU Pre-access 


6 cycles 


Lower bound (guaranteed 

bandwidth) is 

160 MHz/ 6 = 26.27 MHz 


CPU can access every timeslot 


Fractional CPU 
Prcraccess 


6 cycles 


Lower bound (guaranteed 
bandwidth) is 
(160 MHz ♦N/P) 


CPU accesses precede a fraction N of timeslots 
where N = C/T. 
C = cpu^timeslots 
T = total ^timeslots 

p-(6*c + -/*fT-c;;/r 


Interleaved 


4 cycles 


See Section 20.12.1 


At SF = 6. 28 timeslots available for CPU. 
At SF - 4, 21 timeslots available for CPU. 



For CPU Pre-access and Fractional CPU Pre-access modes average and guranteed CPU bandwidth are 
equivalent since the CPU is limited to a certain fraction of timeslots. 

If the CPU runs out of its instruction cache then instruction fetch performance is only limited by the on- 
chip bus protocol. With a 2 cycle bus protocol (address cycle + data cycle) the performance would be 80 
MHz. 



20.12.1 CPU DRAM access performance with interleaved access mode 

Table 78 shows the guara/ife^^/ periodic CPU access with 4 cycle DRAM access znApclk^ 160 MHz. 
Table 78. Guaranteed Periodic CPU access with 4 cycle timeslots and pc//^s 160 MHz 









TimesJots left for CPU 


28.25 


21.5 


Maximum wait for timeslot 


12 cycles 


1 2 cycles 


CPU rate 


13.3 MHz 


13.3 MHz 



Since timeslots are integral multiples of 4 cycles the maximum wait for a timeslot and hence minimum the 
CPU rate must reflect this. 
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Table 79 shows the average CPU access with 4 cycle DRAM access and pclk =160 MHz. This will be a 
bursty access. 

Table 79. Average bursty CPU access with 4 cycle DRAM access and pclk s 160 MHz 









Timeslots left for CPU 


34.95 


30.8 


Maximum wait for timeslot 


acydes 


12 cycles 


CPU rate 


a) MHz 


13.3 MHz 



Interleaving of CPU and write accesses with shared read bus accesses will mean some of the timeslots will 
take 3 cycles rather than 4 cycles. This will mean that CPU slots will be available more frequently and 
higher CPU perfoimancc is attainable. 
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20.13 Implementation 



The DRAM Interface Unit (DIU) is partitioned into 2 logical blocks to facilitate design and verification. 

a. The ORAM Access Unit (DAU) which interfaces to the SoPEC DIU requesters. 

b. The DRAM Controller Unit (DCU) which accesses the embedded DRAM. 




DRAM Access Unit (DAU) 




DRAM 


1 
1 


eDRAM 






Controller 


1 








Unit 










(DCU) 


1 
1 
1 





Ffgure 83. DIU Partition 

The basic principle in design of the DIU is to ensure that the eDRAM is accessed at its maximum rate 
while keeping the circuit latency for each access as low as possible. 

The DCU is designed to interface with single bank 20 Mbit IBM Cu-1 1 embedded DRAM performing 
random accesses every 3 cycles. Page mode write accesses, associated with the CDU, are also supported. 

The DAU is designed to support interleaved accesses allowing the DRAM to be accessed every 3 cycles 
where back-to-back accesses do not occur over the shared 64*bit read data bus. 
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20.1 3.1 Definition of DCU lO 

Table 80. DCU interface 





mmm 




EmiMMmMmmMMESMMM 


Clocks and Resets 


pdk 




In 


SoPEC Functional dock 


prst_n 




In 


Active>low, synchronous reset In pdk domain 


Inputs from OAU 


dau^dcu^cmdavail 




In 


Signal IndicatjnQ a DAU command is available i.e. 
dati^cmd^adr, dau^cmd^twn and dau^cmd^refresh are valid. 


dau_dcu_cmdadit21 :51 


17 


In 


Signal Indk^atfng the address for the DRAM access. This is a 
256-bit aligned DRAM address. 


dau_dcu_cmdrwn 




In 


Signal indicating the direction for the ORAM access (1=read, 
0=write). 


dau^dcu^cmdrefresh 




In 


^gnal indtoating that a refresh command is to be issued. If 
asserted daiLcmCLadrand d!auLcnid.nvn witl be ignored. 


dau_dcu.wdata 


256 


In 


256-bit write data to OCU 


dau.dcu.wmask 


256 


In 


256-bit write data mask to DCU 


dau_dcu_wvafid 


17 


tn 


Signal indk:ating valid write data and write mask. 


Outputs to OAU 


dcu_dau_cmdaccept 




Out 


Signal indicating that the DCU has accepted a valid command 
from the DAU. 


dcu_dau_refreshcomplete 




Out 


Signal indicating that the DCU has completed a refresh. 


dcu_dau_rdata 


256 


Out 


256-bit read data from DCU. 


dcu_dau_rrvand 




Out 


Signal indicating valid read data on dcu^rdata. 


Outputs to DRAM 


1 1 


Inputs from DRAM 


1 
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20.1 3.2 Definition of DAU lO 

Table 81. DAU Interface 



Clocks and Resets 


pc(k 


1 


In 


SoPEC Functional dock 


prst^n 


1 


In 


Active-low, synchronous reset In pdk domain 


CPU Interface 


cpu_adr(9:2] 


8 


In 


CPU address bus. 8 t>its are required to decode the 
address space Ibr this bloGk 


cpu.ciataout[31:0] 


32 


In 


Shared write data bus from the CPU 


diu_cpu_datat31 :0) 


32 


Out 


Configuration, status and debug read data bus to the CPU 


cpu.rwn 


1 


In 


Common read/not-write signal from the CPU 


cpu.acode[1:0] 


2 


In 


CPU access code signals. 

cpu_acode[0] - Program (0) / Data (1) access 

cpu.acodejij • User (0) / Supervisor (1) access 

The OAU wilt only allow supervisor mode accesses to data 

space. 


cpu_diu_sel 


1 


In 


Block select from the CPU. When cpu_diu^$ef is high both 
cpu^addrand cpujcfataout are valid 


diu_cpu_rdy 


1 


Out 


Ready signal to the CPU. When diu^cpu_rc/y\s high it indi- 
cates the last cyde of the access. For a write cyde this means 
cpu^dataouthas been registered by tiie block and for a read 
cyde this means the data on diujcpu^data is valid. 


diu_cpu_berT 


1 


Out 


Bus error signal to the CPU lndk:ating an invaUd access. 


OtU Read Interface to SoPEC Units 


<unit>_diu_rreq 


1 


In 


SoPEC unit requests DRAM read. A read request must be 
accompanied by a valid read address. 


<unit>_dlLf_radr{21 .5] 


17 


In 


Read address to DIU 

17 bits wkJe (256-btt aligned word). 


diu_<unit>_rack 


1 


Out 


Acknowledge from DIU that read request has been accepted 
and new read address can be placed on <unft>jdlu_mdr 


d]u_data(63:0] 


64 


Out 


Data from DIU to SoPEC Units except CPU. 
First 64^3its is bits 63:0 of 256 bit word 
Second 64.bits is bits 127:64 of 256 bit word 
Third 64-bits is bits 191:128 of 256 bit word 
Fourth 64-blts is bits 255:192 of 256 bit word 


dram_cpu_data[255:0I 


256 


Out 


256-bit data from DRAM to CPU. 


diu_<unit>_rvalld 


1. 


Out 


Signal from DIU telling SoPEC Unit that valkl read data is on 
the diujdata bus 


DIU Write interface to SoPEC UnHs 


<unit>_diu_wreq 


1 


In 


SoPEC unit requests ORAM write. A write request nnust be 
accompanied by a valid write address. 


<unit>_diu_wadr(21 :5] 


17 


In 


Write address to DIU except CPU. CDU 
17 bits wide (256-bit aligned word) 


cpu_adit21:0) 


22 


In 


CPU Write address to DIU 

22 bits wkJe {8-bit aligned word) 

Addresses cannot cross a 256-bit word DRAM boundary. 
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Table 81. DAU interface 











cpu.diu_wmask(1 :0] 


2 


In 


Rag Indicating format of CPU write to DRAM 
00: write 
01: 16-blt write 

11: reserved 

cpu_ad<ii[2:0] are driven In accordance with die width of the 
data access indicated by cpujdiu_wmask. Addresses cannot 
cross a 256-bit word DRAM boundary. 


cdu.criu.wadit21 :3] 


19 


in 


CDU Write address to DIU 

19 bits wide (64-bit aligned word) 

Addresses cannot cross a 256-bit word DRAM boundary 


diu_<unit>_wack 


1 


Out 


Acknowtedge trom DIU that write request has been accepted 
and new write address can be placed on <unit>_diu^wadr 


<unit>_diu_data[63:0] 


64 


in 


Data trom SoPEC Unit to DIU except CPU. 
First 64-bits is bits 63:0 of 256 bit word 
Second 64-bits is bits 1 27:64 of 256 bit word 
Third 64-blts is bits 191:128 of 256 bit word 
Fourth 64-brts is bits 255:192 of 256 bit word 


cpu_dataout[31:0] 


32 


In 


Data from CPU to DIU. 


<unit>jd[u_wvaUd 


1 


In 


Signal from SoPEC Unit indicating that data on 
<unit>_diu_jSata is valid. 


Outputs to DCU 


dau_dcu_cmdavail 


1 


Out 


Signai indicating a DAU command is available i.e. 
dau^cmd_adr, dau_cmd_rwn and dau_cmd^r9fresh are valid. 


dau_dcu.cmdadr[2l :5] 


17 


Out 


Signal indicating the address for the DRAM access. This is a 
256-bit aligned ORAM address. 


dau.dcu_cmdiwn 


1 


Out 


Signal indicating the direction for the DRAM access (1=read, 

0=rwrite). 


dau_dcu_cmdfefr©sh 


1 


Out 


Signal indicating that a refresh command is to be issued. If 
asserted datJLcmaLadrand dau^cmd_rwn will be ignored. 


dau_dcu_wdata 


256 


Out 


256-bit write data to DCU 


dau_dcu_wnriask 


256 


Out 


256-bit write data mask to DCU. 


dau_dcu_wvalid 


17 


Out 


Signal indicating valid write data and write mask. 


Inputs from DCU 


dcu_dau_cmdaccept 


1 


In 


Signal indicating that the DCU has accepted a valid conrvnand 
from the DAU. 


dcu_dau_refreshcomplete 


1 


In 


Signal indicating that the DCU has completed a refresh. 


dcu.dau_rdata 


256 


In 


256-bit read data from DCU. 


dcu_clau_rrvalid 


1 


In 


Signal indicating valid read data on dcu_rdata. 



The CPU subsystem bus interface is described in more detail in Section 1 1.4.3. The DAU block will only 
allow supervisor mode accesses to data space (i.e. cpujacode[l:0] = bl 1). All other accesses will result in 
diu^cpujberr being asserted. 
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20.13.3 DAU Configuration Registers 



Table 82. DAU configuration registers 









M. 




0x00 


Reset 


1 


0x1 


A write to this raoister CAUfies a rp^nt nf thA 
DIU. . 

This register can be read to irujicate the 
reset state: 

0 - reset in progress 

1 - reset not In progress 


0x04 


Refresh Period 


10 


0x000 


Background refresh controller. 
When set to 0 background refresh is off. oth- 
enArise value indicates number of cydes 
between each refresh. 


0x08 


NumMatnTimeslots 


7 


0x40 


Number of main timestots (0-64) 


0x09 


CPUTimeslots 


4 


0x0 


CPUTimeslots out of Totammestots are 
available for the CPU. 


OxOA 


TotafTlmesjots 


4 


0x0 


CPUTtmeslols out of TotalTrmesrots are 
availabfe for the CPU. 


0x1 00^x1 FC 


MainTimesEot 


(64j[51 


0x00 


Progfammable main timesiots (up to 64 
main timesiots) 


0x200-0x208 


Sub3Timeslot 


[3115] 


0x00 


Programmabie sub- timesiots (3 timesiots 
timesiots) 


0x210-0x210 


Sub4Timestot 


[41151 


0x00 


Programmable sub- timesiots (4 timesiots 
timesiots) 


0x220-0x234 


SubSTimestot 


161[5) 


0x00 


Programmable sub- timestots (6 timesiots 
timesiots) 


0x300 


ReadRoundRobinLevel 


12 


0x000 


For each read requester plus refresh 

0 = level 1 of round>robin 

1 = Ievel2 of round-robin 



Each main timeslot and sub-timeslot can be assigned a SoPEC DIU requestor according to Table 83. Main 
tinieslots can be assigned SoPEC units, refresh and sub-timeslots. Sub-timeslots can be assigned SoPEC 
units and refresh but not to other sub-timeslots. 



Table 83. SoPEC DiU Encoding 









None bOOOOO 


0x00 


Write 


CPUCAO 


bOOOOl 


0x01 


SCB 


bOOOlO 


0x02 


CDU{vy) 


bOOOII 


0X03 


SFU(W) 


bOOlOO 


0x04 


OWU 


bOOIOI 


0x05 


Read 


CPU(R) 1 bOOIIO 1 0x06 
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Table 83. SoPEC DIU Encoding 









CDU(R) 


booin 


0x07 


CFU 


bOIOOO 


0x08 


LBO 


bOIOOl 


0x09 


SFU(R) 


bOIOlO 


OxOA 


TEcro) 


bOIOII 


OxOB 


TE(TFS) 


bOllOO 


OxOC 


HCU 


bOIIOI 


OxOD 


DNC 


bomo 


OxOE 


LLU 


bOIIII 


0x0 F 


PCU 


blOOOO 


0x10 


Others 


Refresh 


blOOOl 


0x11 


Subltlmeslot 


blOOlO 


0x12 


Sub2times}ot 


blOOII 


0x13 


Sub3timeslot 


b10100 


0x14 



ReadRoundRobinLevel and ReadRoundRobinEnable registers are encoded in the bit order defined in 
Table 84. 

Table 84. Read round-robin registers bit order 



CPU(R) 


0 


CDU(R) 


1 


CFU 


2 


LBO 


3 


SFU(R) 


4 


TE(TD) 


5 


TE(TFS) 


6 


HCU 


7 


DNC 


8 


LLU 


9 


PCU 


10 


Refresh 


11 
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20.13.4 DIU Partition 



CpM_fWIV 



cpu_addff12:; 
cpu.dataoutj3l :0] 

cpu, 
cpujc(2:0] 
dhJLcpu_datapi .-0)^1 
diu_cpu.Fdy44— 
dJu_Opu.berr< 
<tjnlt>_dluL_rreq 
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Figure 84. DIU Partition 



DRAM 

Controller 
Unit 
(DCU) 



eDRAM 



Doc: SoPEC_hardware_desigh 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 231 



SoPEC : Hardware Design 




20.1 3.5 CPU Interface and Arbitration Logic Suk>-bloclc 

Table 85. CPU Interfece and Arbitration Logic Sub-block 10 Definition 









Clocks and Resets 


pdk 


1 


In 


System Clock 




1 


in 


System reset, synchronous active low 


CPU Interface data and control signals 


cpu_addr(12:2] 


11 


In 


CPU address bus. 1 1 bits are required to decode the 

address space for this block 


cpu_dataout[31:0J 


32 


In 


Shared write data bus from the CPU 


diu_cpu_datajn[31 :0] 


32 


Out 


Read data bus from the DIU to the CPU 


cpu_nivn 


1 


In 


Common read/not-write signal from the CPU 


cpu_fc[2:0] 


3 


In 


CPU Function Code signals. 


cpu_diu_sel 


1 


In 


BIcx:k select from the CPU. When cpu_diu_sel \s high both 
cpu^addr ej\<S cpu^dataout are valid 


diu_cpu_rdy 


1 


Out 


Ready signal to the CPU. When diu^cpu^rdy is high It indi- 
cates the last cyde of the access. F=or a write cycle this 
means cpcjLdafaouf has been registered by the block and 
for a read cyde this means the data on diu_cpu^datain is 
valid. 


diu_cpu_berr 


1 


Out 


Bus error signal to the CPU iruJIcating an invalid access. 


DIU Read Interface to SoPEC Units 


<unit>_diu_rreq 




.n 


SoPEC unit requests DRAM read. 


DIU Write Interface to SoPEC Units 


<unlt>_diu_wreq 




In 


SoPEC unit requests DRAM write. 


Internal Inputs from other DAU blocks 


re^arbitrate 


1 


In 


Signal telling the arbitration logic to choose the next arbitra- 
tion winner. 


debug_req 


1 


In 


DIU request signal from Debug k>gk;. 


Internal Outputs from other DAU blocks 


debug_enable 


17 


Out 


1 s Enable DtU Debug 
0 s Disable DIU Debug 


debug_start_adr(21 :5] 


17 


Out 


DIU debug start address. 


debug_end_adrt2l :5] 


17 


Out 


DIU debug end address. 


unit^gnt 


1 


Out 


Signal lasting 1 cycle whk^ indicates arbitration has 
occurred. Indicates adr^set&nd write jdue are. valid. 


adr_selI4:0] 


5 


Out 


Signal Indicating which requesting SoPEC Unit has won 
arbitration. 
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Table 85. CPU Interface and Arbitration Logic Sub4>lock lO Definition 











write.ciue(1:0] 


2 


Out 


yi^t0_due[O] Indicates if the next artritration winner wiH be a 
write access. unffeLcftief 97 indicates if the subsequent arbi- 
tration winner will be a write access. Vatld on unitjgnt 
mitejdu^ 1] is only required where 2 cycte random ORAJ^ 
access is possible. 


access.type[3:0] 


4 


Out 


Signal Indicating the origin of the winning arbitration 

O000=n)aln timesiot 

0001ssub1times]ot 

00 1 0=:sub2t{nrie5lot 

0011=sub3timeslot 

0100=sub4timeslot 

Ol01=round-robin level 1 

01 10=round-robin Ievel2 

0111=priority 

lOOOscpd round robin 



20,13.5, i CPU Interface and Arbitration Logic Sub-biocic Description 

The CPU Interface and Aibitrarion Logic sub-block is shown in Figure 85, The CPU interface sub-block 
provides for the CPU to access DAU specific registers by reading or writing to the DAU address space. 

The CPU subsystem bus interface is described in more detail in section Section 1 1 .4.3. The DIU block will 
only allow supendsor mode read or write accesses to data space (i.e. cpu Jc[2:0] = blOl). All other ' 
accesses will result in diujcpujberr being asserted. 
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<unlt>_dlu_rreq- 



<unit>_diu_wreq- 



rd.afbitmta 



cpu_fwn- 
cpu_diu_set- 
cpu_addr(l2;21 ^ 

cpu_dataout[3 1 :0J ^ 

cpu^fc(2:01- 
diu_cpu_data[3 1 :0]« 
diu_cpu_f< 
diu_cpu_betT« 



CPU 
Interface 



configuration 



Arbitration 
Logic 



reLrst 



refreslCperiod^ 



uniugnt 



. adr.sel 



write.due 



accaas_type 



Refresh 
Counter 



debtjg_enabla 



debUQ.req 



debo9_start_adfI21 :5] 



debug_end_adr(21:51 



Figure 85. CPU Interface and Arbitration Logic 

Arbitration is triggered by the signal re^arbitrate with the signal unit_gnt indicating that arbitration has 
occurred and the arbitration winner is indicated by adr_sel[4:0]. Arbitration should take I clock cycle-so 
unit ^nt is asserted the clock cycle after re_arbitrate and stays high for 1 clock cycle. adrjsel[4:0] 
remains persistent until arbitration occurs again. The arbitration timing is shown in Figure 86. 



pclk[ 

re_arbitrate 
unit_gnt 

adr_sel(4:0] 
write_due[l:OJ 



00000 



00 



J L 



J L 



01000 



00001 



01 



00 



Figure 86. Arbitration timing 

The basic arbitration table is 64 entries of 5 bits. Arbitration works by having a pointer advance to the next 
entry in the table whenever re^arbitrate is asserted. Four of the main slots can be assigned to a set of 4 
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sub-slots. Each of these sub-slots requires 4 entries of 5 bits and a pointer. The pointer advances along the 
sub-entries every time arbitration selects a sub-slot. Slots can also be assigned to round-robin or priority 
allocations. 

write^due[l:0] wiU point to timeslots 1 and 2 arbitration cycles in the future, write jdue [1:0] looks ahead 
in the arbitration and indicates if the next arbitration winner (based on currently requesting write request- 
ers) and the subsequent arbitrarion winner will be a write access. The write^duefJ.OJ functionality is 
added to allow write transactions be selected early by the command multiplexor sub-block to offset the 
latency of transferring the write data over the write daU busses before the DRAM access can occur. This is 
only required for IBM DRAM. With Toshiba and Philips DRAM, the DRAM access latency masks any 
latency in transferring the write data. 

If an assigned slot is not used (because its corresponding SoPEC Unit is not requesting) then it can be re- 
assigned in a number of ways. Slots can be re-assigned using a round-robin or a priority based assignment 
Round-robin assignment can have up to 21 round-robin slots. Its implementation requires a pointer to keep 
track of the currently assigned round-robin slot. If the slot pointed to is not requesting then the next round- 
robin slot is considered and so on. Round-robin is effectively a priority assignment with the slots assigned 
a priority according to the slot order. Round-robin and priority arbitration will be calculated in a hierarchi- 
cal manner shown in Figure 87. If no round-robin slots are requesting then DRAM access is re-assigned 
according to priority. 



Unill 
Unjt2 

Units 
Unit4 

Units 
Unite 

Unit? 
Unita 

Unit9 
UnitIO 

Unitn 
Unit12 

Uniti 3 
Unitl4 

UnltIS 
Unitie 

Uniti 7 
Uniti a 

Uniti 9 
UnitZO 



1. 



Result 



Figure 87. Hierarchical priority based arbitration calculation 

It may be desirable to have 2 levels of round-robin arbitration. If there is no requester in the first level, then 
the arbitration looks at the second level. If there is no second-level requester then the DRAM access is 
assigned according to priority. 
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There is a background refresh counter which is reset whenever the aibitration logic selects a refresh. If the 
refresh counter reaches refresh _period[9:0] it stops counting and asserted a signal refjssue which forces 
the next arbitration winner to be a refresh. This will delay any fixed timeslot allocation by one slot. If the 
slot was being re-allocated to a non-fixed timeslot requester then refresh will win the arbitration. 

Debug can also request DIU access via the signal debug_req, DIU debug access when enabled will obtain 
DIU access at expense of CPU DIU slots. 



20. 13.5.2 Arbitration of writes 

For write accesses, except the CPU, 256-bits of write data arc transferred from the SoPEC DIU write 
requestors over 64-bit write busses in 4 clock cycles. This write data transfer latency means that writes 
accesses must be arbitrated two timeslots in advance. Figure 88, which repeats the DIU write protocol 
shown in Figure 73, shows why this is necessary. 

If this were a read access, then the read address is captured by the DIU in cycle 4 and presented to the 
DRAM in cycle 5. The read access at the DRAM will start in cycle 5. This corresponds to timeslot A 
write access cannot start until all the write data is available i.e. until cycle 9. This is a 4 cycle delay. The 
write access at the DRAM will not start until cycle 11 which corresponds to the start of timeslot 
Therefore, write arbitration must occur 2 timeslots in advance and incurs an additional latency of 2 cycles. 

The exact timing of read and write accesses will be outlined in Section 20.13 Implementation. 



pclk 

<unit>_diu_wreq 
<unit>_diu_wadr[2 1 :5] 
diu_<unit>_wack 
<unit>_diu_data[63 :0] 

<unit>_diu_wvalid 

timeslot [ n 



cycle number 



J — L 



n+l 



n+2 



6 





1 


2 1 3 1 4 





n+3 I n+4 | 



10 



11 



Figure 88. Write Protocol shown for a SoPEC Unit making a single 256-bit access 
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20.1 3.6 Command Multiplexor Sub-block 

Table 86. Command Multiplexor Sub-block lO Definition 









mmsiMsmmmmmm 


Clocics and Resets 


pclk 


1 


In 


System Clock 


prst_n 


1 


In 


System reset synchronous active low 


DIU Read Interface to SoPEC UnHs 


<un]t>_diu.radr[21 :51 


17 


tn 


Read address to 01 U 

17 bits wide (256-bit aligned word). 


diu_<unit>_rack 


1 


Out 


Acknowledge from DIU that read request has been 
accepted and new read address can be placed on 
<unit>_diu^radr 


DIU Write Interface to SoPEC Units 


<unft>_diu_wadrI21 :5] 


t7 


In 


Write address to DIU except CPU. SCB. CDU 
17 bits wide (256-bit aligned word) 


cpu_addr{21 :0J 


22 


In 


CPU Write address to DIU 

22 bits wide (8-bit aligned word) 

Addresses cannot cross a 256-bit word DRAM boundary. 


cdu_diu_wadr(21 :3] 


19 


tn 


CDU Write address to DIU 

1 9 bits wide (64-bit aligned word) 

Addresses cannot cross a 256-bit word DRAM boundary. 


diu_<unft>_wack 


1 


Out 


Acknowledge from DIU that write request has been 
accepted and new write address can be placed on 
<unit>^<Siu_ wadr 


debug_diu_wadr(21 :5] 


17 


In 


Oetnig write address to DIU 

17 bits wide (256-bit aligned word) 


diu.debug.wack 


1 


Out 


Acknowledge from DIU that debug write request has been 
accepted and new write address can be presented. 


Internal Inputs 


unit_gnt 


1 


In 


Signal tasting 1 cycle which indicates arbitratk>n has 
occurred. 


adr_sei[4:0] 


5 


In 


Signal indicating which requesting SoPEC Unit has won 
artMtration. 


write_duef 1 lO] 


2 


In 


ivrrfe.diuefO/ Indicates if the next arbitration winner wilt be a 
write access. wnYe.duef 9/ indicates if the subsequent arbi- 
tration winner will be a write access. Valid on unit_gnt 
write^duel V fs only required where 2 cycle random DRAM 

access is possible. 


read_cmd_avaii 


1 


In 


Signal irtdicaUng that command multiplexor can issue read 
accesses. 


wriie_cmd_avaii 


1 


In 


Signal indicating ttiat command multiplexor can issue write 
accesses. 


writo_data_avall 


1 


In 


Signal Indicating that valid write data is available for the cur- 
rent command. 


Internal Outputs 


re.arbitrate 


1 


Out 


Signalling telling the arbitration k>gic to choose the next arbi- 
tration ^nner. 


Signals from OCU 


'dcu_cnKf_accept 


1 


In 


Signal indk:ating that the DCU has accepted a valid com- 
mand from the DAU. 
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Table 86. Command Multiptexor Sub-block 10 Definition 













signals to DCU 


daujciTid.ava]| 


1 


Out 


Signal indicating a DAU oonimand is available i.e. 
daujcmd_adr, dau_cmd_^rwn aivi dau^emd^mfreshm 
valid. 


dau.cmd^adr(21 :5] 


17 


Out 


Signal indicating the address for the DRAM access. This is a 
256-bit aligned DRAM address. 


dau_cincl.rwn 


1 


Out 


Signal indicating the direcUon for the DRAM access. 


dau.cmd.refresh 


1 


Out 


Signal indicating that a refresh convnand Is to t>e issued. If 
asserted cmdLadrand cmd^rwnvAW be ignored. 



20.13.6.1 DAU'DCU interface Description 

dau^cmd_avaH indicates that the Command Multiplexor has a valid command to issue. When 
dau_cmd_avail is asserted the signals dau_cmd_adr[2i:5]^ dau_cmd_rwn and daujandjrefresii are valid 
In the case of a write command, daujzmdjayail will not be asserted until the Read- Write Data Multi- 
plexor sub-block has valid write data to supply, indicated by write_data_avail, as well as a valid write 
address. 

The DCU indicates that it has accepted a coxnmand by asserting dcu_cmd_accept for 1 cycle. This indi- 
cates to the Command Multiplexor that it can supply a new conmiand to the DCU, The DCU cannot assert 
dcujcmd_accept until the Command Multiplexor presents a valid conunand as indicated by 
dau^cmdjavail. 

20.13.6.2 Command Multipiexor Sub-biocif Description 

The command multiplexor sub-block issues read, write or refresh commands to the DCU, according to the 
SoPEC Unit selected for eDRAM access by the arbitration logic. The command multiplexor also signals 
the arbitration logic to perform arbitration to select the next SoPEC Unit for eDRAM access. Re-arbitra- 
tion takes place, in general, when the DCU indicates on dcujzmd^accept that it has accepted the previous 
command. 

A state-machine for the command multiplexor is shown in Figure 90. 
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^ ArbitrationEnablesO 
reset=0 ^ 



AibitrationEnable = 0 




ArbitrationEnable = 1 



Figure 90. Arbitration and Address Transfer state-machine 
The states in Figure 90 are defined as follows: 
Table 87. Command Muttlpfexor state description 











IDLE 


Controller goes to this state on reset and when ArbitrationEnable is 
de-asserted. 


RE-ARB 


When ArbitrationEnable is asserted and there is a DIU requester, 
assert rejjrbitrate so Arbitration Logic will select source of next 
DRAM access indicated by adr_sel. 


ACK 


Send acknowledge to source of next DRAM access indicated by 
adr^sel. 


ADR 


Receive DRAM address from source indicated by adrjsel and in 
the next cycle place it in command queue along with adrjsel. 



In the ACK and ADR states of Figure 90, the signal adr^el is used to multiplex between the SoPEC Units 
to capture the cDRAM address of the winning requester, as illustrated in Figure 91 . The winning address is 
written into a command queue together with adr_^el If the winning requester is refresh then no address is 
written into the command queue. 
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SoPECUnitI 



SoPEC Unit 2 
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adr 
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control 
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dau_cmd,nwn 



dau_cmd_refresh 



write_data_avail 
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re_arbitration 
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re_arbitrate 
unit_gnt 

> adr_sel[4:0] 
write_dueI1;0] 
wrtte_cmd_avail 
read_cmd_avail 



Figure 91. Command multiplexor sub-block 

The command at the head of the command queue drives the command issuing logic which generates the 
signals required by the DRAM Controller Unit. The validity of the command is indicated by 
dau_cmd_avaiL The commands are captured by the DCU by a dcu_cmdjaccept strobe lasting 1 cycle. 
This signal also causes the command queue FIFO to be popped so that the next command is available to be 
be captured by the DCU. For a write command, daujcmd^avail will not be asserted until there is also valid 
write data present. This is indicated by the signal write^data^avail from the Read- Write Data Multiplexor. 

Nonnally, the command queue will only have one filled location i.e. dcu_cmd_accept will cause a com- 
mand to be captured by the DCU and re-arbitration will be kicked off so as to provide the next command 
to the DCU in time for the next dcu_cmd_accept strobe. This is true for read and refresh commands. The 
timing is shown in Figure 92. It is assumed there is a pipeline delay between dcujcmd^accept and 
re^arbitrate and a ftirther pipeline delay between the address received from the SoPEC Unit and the 
address the DAU issues to the DCU. 

For refresh commands: 
ro_ari>itr«to <• dcu_c/ncL.accept AND (command queue not full) . 

For read commands which use the shared read bus, we must also ensure that the read multiplexor logic is 
available to transmit the read data to the SoPEC read requester. This is indicated by the signal 
read_cmd_avail which provides flow control from the read data multiplexor logic. So for read commands: 
re_arbitra to <= dcu_cmd_accepc AND {command queue not £ull) AND 

( ( (c/nd_adr_sei shared read bus access) AND re»d__cJod_avail) 

OR (cind_Adr_sel = CPU) ) 
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A shared read bus access is any read access except the CPU. 




unit.^t/ack 



adr 

command.queue 1 
coinniand_queue2 
dau_cmd_adr 



• 1 
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1 3 
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1 ' 
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' 1 
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3 


1 



Figure 92, Command Multiplexor sub*block timing for 2 cycle ORAM access read and refresh 

accesses 

In the case of a write command, the write data must be transferred from the SoPEC requester before the 
write can occur. Arbitration should occur early to allow for any delay for the write data to be transferred. 
Figure 73 indicates that write data transfer over 64-bit busses will take a fiirther 5 cycles after the address 
is transferred. The arbitration logic produces the signals write^duefOJ and write^dueflj which point to 
timeslots 1 and 2 arbitration cycles in the future to indicate in advance ftiture write accesses. 

For Toshiba and Philips 8 or 9 cycle DRAM write access no such future write arbitration is required. In 
that case 

rQ_arbiCraCe <= dcu_cmci_aecepc AND Ccommand <iueue not full) AND <(<cind adr_sel = shared 
read bus access) AND read^avail) OR Hcmd_adr_sel = write access) AND write^cmd^avail)). 

For 2 cycle random DRAM access and 5 cycle write data transfer latency both write^duefOJ and 
write^duefJJ are required The command queue must be sized at 3 deep. If the timeslot sequence for 
DRAM access is a read, followed by a write, followed by a write, then arbitration should occur to select 
the read access immediately followed by re-arbitration twice more to select the future write accesses. The 
condition for re-arbitration in advance of the write access is 

r©_arbitrate <= (command queue not full) AND wrice^cmd_avail AND (writo^dxi&(0) OR 
wrxte_duG(l)) . " 

write^cmd^avail provides flow control from the write data multiplexor logic. 
Open Issue 

The mechanism described here only pre-empts write accesses indicated by write jiue [1:0] based on fixed 
timeslot allocation. If a write access is selected based on un-used timeslot re-allocation then it will intro- 
duce a latency in the access stream. One possibility to only disallow un-used timeslot allocation in the case 
of write accesses. An alternative is to use 256-bit data busses to transfer write data but this is likely to 
cause an increase the area of the DIU and/or decrease the chip wiring utilization. Another possibility is to 
have separate un-used timeslot reallocation logic for write accesses associated with write_due[l :0] i.e. 
write requests are always only considered 1 or 2 arbitration cycles in the future. In this case there is effec- 
tively a different time window for considering write requests and read requests. 
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20.1 3.7 Read and Write Multiplexor Sut>-block 



Table 88. Read and Write Multiplexor Sub-block lO Definition 











Clocks and Resets 


pdk 


1 


In 


System Clock 


prst^n 


1 


In 


System reset, syrichronous active low 


DIU Read Interface to SoPEC UnHs 


diu^data[63:0] 


64 


Out 


Data from DIU to SoPEC Units except CPU. 
Rrst 64-bit$ is bits 63:0 of 256 bit word 
Second 64-bits is bits 127:64 of 256 bit word 
Third 64-bits is bits 191:128 of 256 bit word 
Fourth 64-bits is bits 255:1 92 of 256 bit word 


diu_cpu_data[255:0] 


256 


Out 


256'bitdata from OtU to CPU. 


• dlu_<unlt>_rvatld 


1 


Out 


Signal from DIU telling SoPEC Unit that valid read data is on 
the diu^data bus 


DIU Wifte Interface to SoPEC Uniu 


<unit>_dlu_data[63:0) 


64 


In 


Data from SoPEC Unit to DIU except CPU. 
Rrst 64-bits is bits 63:0 of 256 bit word 
Second 64-bits is bits 127:64 of 256 bit word 
Third 64-bits is bits 191:128 of 256 bit word 
Fourth 64-bits is bits 255:192 of 256 bit word 


cpu_diu_data(31 :01 


32 


in 


Data from CPU to DIU. 


cpu_addr(4:0] 


5 


In 


Lower bits of CPU Write address to indicate which byte 
within the 256-bit DRAM word is selected. 


cpu_dlu_wmask(1 :0] 


2 


In 


Flag indicating format of CPU write to ORAM 
cpu_diu^wmask- "00-: 8-bit write 
cpu_diu_wma$k = t)!": 16-bit write 
cpu_diu_wmask = "10": 32-bit write 
cpu_d{u_wma$k = ''11'': reserved 

cpu_addi[2:0] are driven in accordance with the width of the 
data access indicated by cpu^diu^wmask. Addresses can- 
not cross a 256-bit word DRAM boundary. 


<unit>_dlu_wvalid 


1 


In 


Signal from SoPEC Unit indicating that data on 
<unit>_diu_data is valid. 


Internal Inputs 


unit_gnt 


1 


In 


Signal lasting 1 cycle which indicates arbitration has 
occurred. 


adr_sel[4:0] 


5 


In 


Signal indicating which requesting SoPEC Unit has won 
arbitration. 


Internal Outputs 


read_cmd_avajl 


1 


Out 


Signal indicating that command multiplexor can issue read 
accesses. 


write_cmd_avall 


1 


Out 


Signal indicating that command nmjltiptexor can issue write 
accesses. 


write.data.avail 


1 


Out 


Signal indicating that valid write data is available for the cur- 
rent command. 


DCU Inputs 


dcu.rdata 


256 


In 


256-bit read data from DCU. 


dcu^nvalid 


1 


In 


Signal indicating valid read data on dcu_rdata. 


DCU Outputs 
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Table 88. Read and Write Multiplexor Sut>-block 10 Definition 





mm 








aau.waaia 




Out 


256-bft write data to DCU 


dau.wmask 


256 


Out 


256-bit write data mask to DCU tor IBIVl DRAM (byte masks 
are used for PhiKps and Toshiba DRAM). 


dau.wvalid 


17 


Out 


Signal Indicating valid write data and write mask. 


Debug signals 




delxig^wdata 


256 


In 


256-bit debug write data 


debug.wvalid 


1 


In 


256-Wt debug write data valid 


read_adr_sel[4:0) 


5 


Out 


Signal indicating the SoPEC Unit for which the current read 
transaction is occurring. 


read_complete 


1 


Out 


Signal Indicating thai read transaction to SoPEC Unit indi^ 
cated by read_adr^set is complete. 


write_adr_sel(4 :0] 


5 


Out 


Srgnal IndicatinQ the SoPEC Unit lor which the current write 
transaction is occurring. 


write.complete 


1 


Out 


Signal indicating that write transaction to SoPEC Unit indi- 
cated by wnte_adr_SBt is complete. 
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20.13.7.1 Read Multiplexor logic description 



dcu^rvalid 



dcu_rdata 



256, 

— ^ 



256 diu_cpu_data 



rvalid 



1 2 



CPU 



256, 
■7^ 



64 


64 


64 


64 


64 


64 


64 


64 



diu.data 



64, 



64, 



diu_data 

64^ 



/ 



rvalid 



SoPEC Unit 1 



diu data 



rvatid 



SoPEC Unit 2 



diu_data 



rvafid 




SoPEC Unit n 



Figure 93. Read multiplexor logic 

There are 2 read channels - one for the CPU and a shared read bus for the rest of SoPEC. The shared read 
bus has buffering for 2 times 256-bits of read data i.c 256-bits of data can be received from the DCU while 
data is being transferred over the 64-bit shared read bus to the SoPEC Units. Once a read address is issued 
by the arbitration logic the adr jsel[4:0] value is put into a read command queue in the read control logic. 
The queued adr_sel[4:0] values allow the dcu_rvalid and read data from the DCU to be directed to the 
correct source. In the case of the CPU bus dcu^rvalid and dcu_data can be multiplexed by the 
adr _jelf4:0J value at the head of the FIFO directly to the CPU. If the incoming data goes over the shared 
read channel then the data is stored in a 2 deep 256-bit read data buffer and output over 4 cycles to the 
SoPEC requester when the previous transaction on the shared read bus is complete." 

The depth of the adr_sel[4:0] read command queue is 2. When the queue is full no further adr^sel[4:0] 
can be accepted and no further read commands can be issued by the command multiplexor to the DCU. 
This provides flow control back to the re-arbitration logic in the command multiplexor. The signal 
read_cmdjj\ail indicates that spaces are available in the read command queue 

The 2 deep command queue and the double data buffer means that the access rate will be limited to which 
ever takes longer - DRAM access or transfer of read data over the shared read data bus. Some extra logic 
may need to be added to time the assertion of read_cmd^avail so that the read. latency is kept to a mini^ 
mum. 
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S5 



20. 13. 7. 2 Write Multiplexor logic description 



cpu_ diu wdata 32. 

= 7^ 



cdu^diu^wdata 64^ 



SoPEC Unit 1 



<uylt>,diu,<teta^ WRITE DATA MUX X2 



write data buffer 
^■"123 




Frgure 94, 

Once a write address is issued by the arbitration logic, the adr^el[4:0] vaJuc is put into a write command 
queue FIFO in the write control logic. The queued adrj5el[4:0] values aUow the wvalid and write data 
from the SoPEC requester to be multiplexed to the DCU. The write multiplex logic is duplicated 2 times to 
provide two overiapping write channels. If 256-bit write data busses are used then a single write channel 
which can be shared by CPU is all that is required. 

The depth of the adr_sel[4:0] queue is 3. When the queue is full no ftirther adr jsel[4:0] values from the 
CPU Arbitration block will be accepted. This provides flow control back to the re-arbitration logic in the 
Command Multiplexor sub-block. The 2 channels cannot select the same SoPEC write requester. 
yvnte_cmd_avail is asserted whenever there is a space in the queue. There are 2 write pointers and 1 read 
pointer. Each of the channels has a write pointer associated with it write^data^avail indicates that valid 
write data is available to be issued along with the address the command multiplexor will issue. 
There are 2 special cases for write accesses - CPU writes and CDU writes. 
CPU writes 

In the case of CPU writes the CPU write data bus is only 32-bits wide, cpu_diu_wmaskfl :0J indicates how 
many bits have to be written: 8, 16 or 32-bits. The associated address cpu_addrf2J:0J is a byte aligned 
address. The actual DRAM write must be a 256-bit access. The command multiplexor issues the 256-bit 
DRAM address cpu_addr [2 ] : 5] . cpu_addr[4:0] and cpujdiujMnaskfL O] are used to calculate the bit 
write mask wmaskf255:0J for the write access. 
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8-bit write: cpu^diu^wmaskfl: 0] = '00': 

bits 8*cpu^addir[4:0) to (8* (cpu^addr [4 :0] -hi) -1) of wma3k[255: 0) are asserted. 
16-bit write: €:pu^diu^wmask[l : 0] = *01'.- 

bits 8*cpu^addr[4:0] to (16* (cpu_addr[4:0}-t'l) -1) of wmAsk[255:0} are asserted. 
J2-bit write: cpu_diu^wmask[l : 0] = *10': 

bits 8*cpu^addr(4:0} Co (32^ (cpu^addr[4 :0} +lf -1) of wmask [255:0] are asserted. 

CDU writes 

Each CDU write access is a burst of 4 times 64-bits of write data to the 64-bits of the 256-bit DRAM 
address indicated by cdujdiu_wadr[21:3] and the 3 subsequent 256-bit DRAM words. If these 4 DRAM 
words lie in the same DRAM row then an efficient access will be obtained. The command multiplexor 
logic niust issue 4 successive accesses to 256-bit DRAM addresses cdujiiu_wadr[21: 5], •^1,^2,-^-3, 
wmask[255:0J is calculated using cdujiiujwadr[4:3] i.e. bits 64^ cdu_diu__wadrf4:3J to 
(64*(cdu_diu^wadrf4:3J'^l) -J) are asserted. 
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20.1 3.8 DIU Debug Sub-Block 

TabQeftO. DIU ug Sub-block 10 Dennltlon 







cfocKs and Resets 




pdk 


1 


In 


System Clock 


prst_n 


1 


In 


System reset, synchronous active k>w 


CPU Interface and Arbitration Logic 


detMjg^enable 


1 


In 


1 = Enable DIU Debug 
0 = DIsabte DIU Debug 


debug_start_adr(21 :5] 


17 


In 


DIU debug start address. 


debu9_end_adrt21 :5] 


17 


In 


DIU debug end address. 


debug_req 


1 


Out 


DIU request signal from Debug logic. 


<uni^_diu_rreq 


1 


In 


SoPEC unit requests DRAM read. A read request must be 
accompanied by a valid read address. 


<uni&.diu_wreq 


1 


In 


SoPEC unit requests ORAM write. A write request must be 
accompanied by a valid write address. 


unil_Qnt 


1 


in 


Signal lasting 1 cycle which indicates arbitration has 
occuned. Indicates a^cse/and write^due are valkl. 


adr_se][4:0] 


5 


In 


Signal indtoating whk^ requesting SoPEC Unit has won 
arbitration. 


acce5S.typeI3:0] 


4 


rn 


Signal indicating the origin of the winning arbitration 

O000==maintimeslot 

OOOIs^bltimeslot 

00l0ssub2timeslot 

001 1wSub3Umeslot 

0100ssub4times1ot 

0101=round-robin leveil 

01 1 0=round-robin Ievel2 

0111=priority 

1000=:cpd rourKi robin 


DRAM Control Unit 


dcu_relresh_complete 


1 


In 


Signal indk:ating that the DCU has completed a refresh. 
Exact timing needs to be defined. 


Read Write Data Multiplexor 


debug.wdata 


256 


Out 


256-bit detxjg write data 


debug_wvalfd 


1 


Out 


256-bit debug write data valid 


fead_adr_sell4:0} 


5 


In 


Signal indicating the SoPEC Unrt for whicb the current read 
transactton is occurring. 


read_complete 


1 


In 


Signal indicating that read transaction to SoPEC UnK indi- 
cated by read_adr_sel is complete. 


write^adr_sei[4:0] 


5 


tn 


Signal indicating the SoPEC Unit Ibr which the current write 
transaction is occurring. 


write_complete 


1 


In 


Signal indicating that write transaction to SoPEC Unit indi- 
cated by wfite_adr_set xs complete. 



Externa] visibility of the DIU must be provided for debug purposes. To allow this special debug logic is 
added to the DIU. When DIU debug is enabled by debug^enable, the DIU debug sub-block will collect 
debug data. The DIU debug sub-block will itself have a 256-bit double buffer interface to the DIU. When 
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256-bits of debug data have been collected then the DIU debug sub-block will request access to DRAM by 
asseiting debug_req. When the request is acknowledged on diujtebug_wack the DIU debug circuit will supply a 
write address, 256-bits of write data and a write valid. The aibitration logic will give the DIU Debug priority over CPU 
slots. This means the Debug 256-bit double-buffer will never overflow and debug will not affect the access of other 
blocks except the CPU (but this should not be important as debug will request with a low frequency and there arc 
many timeslots assigned to the CPU). The DIU Debug circuit will generate a write address by incrementing (and 
wrapping around) between debug_startjadrpi:5] and debug^end_adr[2J:5J, 

Two kinds of debug information seem sensible to gather. 

a. The order and source of DIU requesters winning. This is obtained by storing adrjsel[4:0] 
along with the access Jype [3:0] every time unit^gnt is asserted. 

b. The time between a DIU requester requesting an access and completing the access. This infor- 
mation is obtained by having a counter for each DIU requester. The counter is reset and starts 
counting when the Unit starts requesting. The count is reset when the read or write access is 
complete as indicated by readjcomplete AND read jadr^sel [4:0] OR writejcomplete AND 
wntejadr ^el[4:0]. When refresh is complete this is indicated by dcu_refresh_compIete. Typi- 
cally most SoPEC DIU requesters require an access every 256 cycles so a 10-bit counter is suf- 
ficient for most requesters. HCU and TE(TFS) require only a few accesses per line so in this 
case 15-bit coimters are adequate. The coimt is remmed along with the index of the SoPEC 
Unit. 

The two kinds of debug information need to be both written to the DRAM debug chaimel. This can be 
achieved by filling separate 256-bit double buffers' with the 2 sources of information. Each 256-bit word 
may contain un-used bits depending on the packing. The first bit of each 256-bit debug word will indicate 
which of the two kinds of debug information is contained therein (0 for a. 1 for b above). 
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PEP Subsystem 
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21 PEP Controller Unit (PCU) 

21.1 Overview 

The PCU has three functions: 

• The first is to act as a bus bridge between the CPU-bus and the PCU-bus for reading and writmg PEP 
configuration registers. 

• The second is to support page banding by allowing the PEP blocks to be reprogrammed between bands 
by retrieving commands fi^om DRAM instead of being programmed directly by the CPU. 

• The third is to send register debug information to the RDU, within the CPU subsystem, when the PCU 
is in Debug Mode. 



21.2 INTERFACES BETWEEN PCU AND OTHER UNITS 



CPU 



XXX_pcu_rdy — 
XXX.pcu^data — 
pcu^XXX.sel ^ 
pcu_dataout ^ 
pcu.adr ^ 
pcu^fwn -4- 



32. 



32, 



12y 





cpu_adr 1^ i 


-> 




cpu.dataout ^ 


32^1 


-> 




pcu.cpu.data 


^ • 32 




4- 


cpu_rwn 




-> 




cpu.pCLi_sel 






pcu_cpu_ndy | 




cpu_acode 








pcu_cpu_b«rT 






4- 


PCU CPU debug^valifl 




m — 1 



CDU 



LBD 



cdu.ftnishedband 



TE 



(bd.finishedbsnd 



state 
machine 



64." 



17 



ORAM 
Interface 



i: ±— X 



te_fints^edband 



end of band 
unit 



PEP controller unit 



Intennjpt Controller Unit 
(ICU) 



Figure 95. Block diagram of PCU 



21,3 Bus BRIDGE 

The PCU is a bus-bridge between the CPU-bus and the PCU-bus. The PCU is a slave on the CPU-bus 
is the only master on the PCU-bus. See Figure 1 3 on page 39. 



Doc: SoPEC_hardware_design S3 Proprietary Document 

Version: 2.3 



29 Nov 2002 

Page 250 




SoPEC : Hardware Design 



21.3.1 CPU accessing PEP 



All the blocks in the PEP can be addressed by the CPU via the PCU. The MMU in the CPU-subsystem 
will decode a PCU select signal, cpu^cujseU for all the PCU mapped addresses (see section 11.4.3 on 
page 70). Using cpu^adr bits 15-12 the PCU will decode individual block selects for each of the blocks 
within the PER The PEP blocks then decode the remaining address bits needed to address their PCU-bus 
mapped registers. Note: the CPU is only permitted to perfonn supervisor-mode data- type accesses of the 
PEP, i.e. cpu^acode 11. If the PCU is selected by the CPU and any other code is present on the 
cpujjcode bus the access is ignored by the PCU and the pcu^cpujberr si^ial is strobed, 

CPU commands have priority over DRAM commands. When the PCU is executing each set of four com- 
mands retrieved from DRAM the CPU can access PCU-bus registers. In the case that DRAM commands 
are being executed and the CPU resets the CmdSource to zero, the contents of the DRAM CmdFifo is 
invalidated and no further commands from the fifo arc executed. The CmdPending and NextBandCmdEn- 
able work registers are also cleared 



The PCU can be programmed to associate microcode in DRAM with each finishedband signal. When a 
finishedband signal is asserted the PCU will read commands from DRAM and execute these commands. 
These commands are each 64-bits (see Section 21.8.5) and consist of 32-bit address bits and 32 data bits 
and allow PCU ms^ped registers to be progranuned directly by the PCU. 

If more than one finishedband signal is received at the same time, or others are received while microcode 
is already executing, the PCU will hold the commands as pending, and will execute them at the first oppor- 
tunity. 

Each microcode program associated with cdujinishedband, Ibd Jinishedband and te Jinishedband would 
simply restart the appropriate unit with new addresses - a total of about 4 or 5 microcode instructions. As 
well, or alternatively, pcu^nishedband can be used to set up all of the units and therefore involves many 
more instructions. This minimizes the time that a unit is idle in between bands. The pcu Jinishedband con- 
trol signal is issued once the specified combination of CDU, LBD and TE (programmed in BandSelecu 
Mask) have finished their processing for a band. 



Interrupts are generated when the various page expansion imits have finished a particular band of data 
from DRAM. The cdu Jinishedband, Ibdjinishedband and te Jinishedband signals are combined in the 
PCU into a single intcmipt pcu Jinishedband which is exported bythe PCU to the interrupt controller. 

The PCU mapped registers should only be accessible from Supervisor Data Mode. The area of DRAM 
where PCU commands are stored should be a Supervisor Mode only DRAM area. Configtxration register 
address legality is not enforced by the MMU i.e. the MMU does not check if the block address points to a 
valid PEP subsystem block. When the PCU is executing commands from CPU, any block-address decoded 
from a command which is not part of the PEP block-address map will cause the PCU to ignore the com- 
mand and strobe the pcujnvalid_address interrupt signal. The CPU can then interrogate the PCU to find 
the source of the illegal command. 

When the PCU is executing commands from DRAM, any address decoded from a command which is not 
part of the PEP address map will cause the PCU to: 

•* Cease execution of current command and flush all remaining conunands already retrieved from 
DRAM. 

• Clear C/7u/Pe/u/in^ work-register 

• Clear NextBandCmdEnable registers. 



21.4 



Page banding 



21.5 



Interrupts, address legality and security 
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• Set CmdSource to zero. 

In addition to cancelling all current and pending DRAM accesses the PCU strobes the 
pcujnvaiid^address interrupt signal. The CPU can then interrogate the PCU to find the source of the ille- 
gal command. 



When the need to monitor the (possibly changing) value in any PEP configuration register the PCU may be 
placed in Debug Mode. This is done via the CPU setting certain Debug Address and Debug Enable regis- 
ters within the PCU. Once in Debug Mode the PCU continually performs read accesses of the target PEP 
configuration register (following the protocol detailed in Section 21.8.2) and sends the read value to the 
RDU. Debug Mode has the lowest priority of all PCU functions: if the CPU wishes to perform an access or 
there are DRAM commands to be executed they will interrupt the Debug access, and the PCU will resume 
Debug access once a CPU or DRAM command has completed. 



21.6 



Debug Mode 
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21.7 Implementation 

21.7.1 Definitions of I/O 



Table 90. PCU Port List 







fW 




Clocks and Resets 


pdk 


1 


In 


SoPEG functional dock 


prsl_n 


1 


In 


Active-low, synchronous reset in pdk domain 


End of Band Functionality 


odu_fintshedband 


1 


In 


Finished band signal from COU 


Ibd.finishedband 


1 


In 


Rnlshed band signal from LBD 


te.finishedband 


1 


In 


Rnished band signal from TE 


pcu.fintshedband 


1 


Out 


Asserted once the spedfied combinatk>n of CDU, 
LBD, and TE have finished their processing for a 
band. 


PCU address error 


pcujcu^addressjnvalid 


1 


Out 


Strobed if PCU decodes a non PEP address from commands 
retrieved from DRAM or CPU. 


CPU Subsystem Interface Signals 


cpu_adf(15:2) 


14 


In 


CPU address bus. 14 bits are required to decode the address 
space for the PEP. 


cpu_dataout[31:0] 


32 


In 


Shared write data bus from the CPU 


pcvJ«.cpu_data{31 :0) 


32 


Out 


Read data bus to the CPU 


cpu_rwn 


1 


in 


Common read/not-wrtte signal from the CPU 


cpu.acode[1:0] 


2 


In 


CPU Access Code signals. These decode as follows: 

00 - User program access 

01 - User data access 

10 - Supervisor program access 

1 1 - Supervisor data access 


cpu_pcu_sel 


1 


In 


Block select from the CPU. When cpu _pcu^sef is high both 
cptiLadrand cp£/_d!araourare valid 


pcu«cpu_rdy 


1 


Out 


Ready signal to the CPU. When pcuLCpu_ii^y is high it indicates 
the last cyde of the access. For a write cyde this means 
cpu__dataout has been registered by the bk>ck and for a read 
cyde this means the data on pcujcpu_aata Is valid. 


pcu_cpu_berr 


1 


Out 


Bus error signal to the CPU indicating an invalid access. 


pcu_cpu_debug_valid 


1 


Out 


Debug Data valid on pcu_cpu_data bus. Active high. 


PCU Interface to PEP blocks 


pcu^adrt11:2] 


10 


Out 


PCU address bus. The 1 0 least significant bits of cpu_adr[15:2] 
allow 1 024 32-bit word addressable locations per PEP block. 
Only the number of bits required to decode the address space 
are exported to each block. 


paj_dataout(3l :0J 


32 


Out 


Shared write data bus from the PCU 


<unii>_pcu_dataint31 :0J 


32 


In 


Read data bus from each PEP subbiock to the PCU 


pcu_rwn 


1 


Out 


Common read/not-wrtte signal from the PCU 
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Table 90. PCU Port List 











pcu_<untt>_sel 


1 


Out 


Block select for each PEP t)ioGk from the PCU. 
Decoded from the 4 most significant bits of cpu_adft1S:2]. 
When pcu^<unii>_sei\s high both pcu_ad!r and pcu^dataout 
are valid 


<unil>j)cu_rdy 


1 


In 


Ready from each PEP block signal to the PCU. When 
<un!t>^>cu_rdy is high it indicates the last cycle of the access. 
For a write cyde this means pcujcSataout has k>een registered 
by the bk>ck and tor a read cycle this means the data on 
<unlt>.j}cu^clatain is valid. 


DIU Read Interface signals 


pcu_cliu_rreq 


1 


Out 


PCU requests DRAM read* A read request must be accompa- 
nied t)y a valkl read address. 


pcu_dlu_radrI21 :5) 


17 


Out 


Read address to DIU 

17 bits wide (256-bit aligned word). 


diu_pcu_rack 


1 


In 


Acknowledge from DIU that read request has been accepted 
and new read address can be placed on pcu_diu^rudr 


diu.data[63:0] 


64 


In 


Data from OIU to PCU. 
First 64-bit3 is bits 63:0 of 256 bit word 
Second 64-bits is bits 127:64 of 256 bit word 
Third 64-bits is bits 191:128 of 256 bit word 
Fourth 64-bits is bits 255:1 92 of 256 bit word 


diu_pcu.rvalid 


1 


in 


Signal from DIU telling PCU that valid read data Is on the 
diu^data bus 
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21.7.2 Configuration Registers 



Tabre 91. PCU Configuration Registers 













Control regis 


ters 


OXuO 


Reset 


1 


0x1 


A write to this register causes a reset of the PCU. 
This register can be read to Indicate the reset 
state: 

0 - reset in progress 

1 - reset not In progress 


0x04 


CmdAdrt21:5] 
(256*blt aligned ORAM 
address) 


17 


0x00 
000 


The address of the next set of commands to 
retrieve from DRAM. 

When this register is written to. either by the CPU 
or DRAM command. 1 is also written to Cmd- 
Source to cause the execution of the commands 
at the specified address. 


oxoa 


BandSelectMask[2:0] 


3 


0x0 


Selects which input finishedBand flags are to be 

watched to generate the combined finlshedAll- 

Band signal. 

BitO - lbd_finishedband 

Biti - cdu_finishedband 

Bit2 - te^finishedband 


OxOC. 0x10. 
0x14. 0x16 


NextBandCmdAdft3:01[21 :5] 
(256-bit aligned DRAM 
address) 


4x17 


0x00 
000 


The address to transfer to CmdAdr as soon as 
possible after the next finishadBand[n] signal has 
been received as long as NextBandCmdEnabie[n] 
is set 

A write from the PCU to NextBandCm(iAdr[n]vA\h 
a non-zero value also sets NextBandCmdEna- 
btefnl A write from the PCU to NextBandCm- 
dAdffnJ with a 0 value clears 
NextBandCmd£nabi0[n). 


0x20 


CmdSource 


1 


0x0 


0 - commands are taken from the CPU 

1 - commands are taken from the CPU as well as 
DRAM at CmdAdr. 


0x24 


DebugSelect(15:2) 


14 


0x00 
00 


Debug address select Indk^tes the address of 
the register to report on the pcu_cpu^data bus 
when it is not otherwise being used, and the PEP 
bus is not being used 
Bits [15:12] select the unit (see Table 92) 
Bits [1 1 :2] select the register within the unit 


Work registers (read only) 


0x28 


(nvatidAddress[21:3] 
(64>bit aligned ORAM 
address) 


19 


0 


Address of illegal 64-bit comnoand in DRAM. 
Only valid when pcu_icu_address^invalidha3 
been strobed. (64-bit aligned address) 


0x2C 


CmdPending 


4 


0 


f=br each bit 

0 - no commands pending tor NextBandCmd[n] 

1 • commands pending for NextBandCmdAdifn] 
Read only register. 
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Table 91. PCU ConfiguraUon Registers 







il&iiiMliifiliilil 


0x34 


RnlshedSoFiu 


3 


0x0 


The appropriate bit is set whenever the corre- 
sponding Input linfshedBand flag is set and the 
oonresponding bit in the BandSelectMasIc bit Is 
atso set. 

If all RnishedSoFar bits are set wherever Band- 
Select bits are also set, all RnishedSoFar bits are 
cleared and the output finish edAHBand signal is 
given. 

Read only register. 


0x8 


NextBandCmdEnable 


4 


0x0 


This register can be written to indirectly (i.e. the 
bits are set or cleared via writes to NextBandCm- 
dAdrfnJj 
For each bit: 

0 - do nothing at the next ftnished9and[n] signal 

1 - Execute instructions at NextBandCmdAdrfn) 
as soon as possible after receipt of the next fln- 
ishedBand[n] signal. 

BitO - lbd_fini$hedband 
Biti - cdu_fin1shedband 
Bit2 - te_finisliedband 
Bit 3 • finishedAltBand 
Read only register. 



21.8 Detailed DESCRIPTION 

21 .8.1 PEP Blocks Register Map 

All PEP accesses arc 32-bit register accesses. 

From Table 92 it can be seen that four bits only are necessary to address each of the sub-blocks within the 
PEP part of SoPEC. Up to 14 bits may be used to address any configurable 32-bit register within PEP. This 
gives scope for 1024 configurable registers per sub-block. This address wiii come either from the CPU or 
from a command stored in DRAM. The bus is assembled as follows: 

• adr[ 15:12] = sub-block address 

• adr[n:2] = 32-bit register address within sub-block, only the number of bits required to decode the reg- 
isters within each sub-block are used. 



Table 92. PEP blocks Register Map 







PCU 


0x0 


CDU 


0x1 


CPU 


0x2 


LBD 


0x3 


SFU 


0x4 


TE 


0x5 


TFU 


0x6 


HCU 


0x7 


ONC 


0x8 
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Si 



Tabfe 92. PEP blocks Register Map 







owu 


0x9 


LLU 


OxA 


PHI 


OxB 


Reserved 


OxCtoOxF 



21.8.2 Internal PCU PEP protocol 

The PCU perfonns PEP configuration register accesses via a select signal, pcu_<block>_sel. The read/ 
write sense of the access is communicated via the pcu_rwn signal (1 « read, 0 = write). Write data is 
clocked out, and read data clocked in upon receipt of the appropriate select-read/write-address combina- 
tion. 



Read 



pclk 




rLTLH 



pcu^adr(l3:2] [^^^ PEP address 



PEP address 



pcu^rwn 
pcu_<block>_sel 
<block>_pcu_rdy 



pcu_dataout[31 :0) s^^j^ PEP data 
pcu_<block>_sel 



<block>_pcu_rdy 



<block>_pcu_data[31 :01 



Figure 96. PCU accesses to PEP registers 

Figure 96 shows a write operation followed by a read operation. The read operation is shown v^rith wait 
states while the PEP block returns the read data. 

For access to the PEP blocks a simple bus protocol is used. The PCU first determines which particular PEP 
block is being addressed so that the appropriate block select signal can be generated. Dimng a write access 
PCU write data is driven out with the address and block select signals in the first cycle of an access. The 
addressed PEP block responds by asserting its ready signal indicating that it has registered the write data 
and the access can complete. The write data bus is common to all PEP blocks. 

A read access is initiated by driving the address and select signals during the first cycle of an access. The 
addressed PEP block responds by placing the read data on its bus and asserting its ready signal to indicate 
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to the PCU that the read data is valid Each block has a separate point-to-point data bus for read accesses to 
avoid the need for a tri-stateable bus. 

21.8.3 PCU DRAM access requirements 

The PCU can execute register programming commands stored in DRAM. These commands can be exe- 
cuted at the start of a print run to initialize all the registers of PEP. The PCU can also execute instructions 
at the start of a page, and between bands. In the inter-band time, it is critical to have the PCU operate as 
fast as possible. Therefore in the inter-page and inter-band time the PCU needs to get low latency access to 
DRAM. 

A typical band change requires on the order of 4 commands to restart each of the CDU, LBD, and TE, fol- 
lowed by a single command to terminate the DRAM command stream. This is on the order of S commands 
per restart component. 

The PCU does single 256 bit reads from DRAM. Each PCU command is 64 bits so each 256 bit DRAM 
read can contain 4 PCU commands. The requested command is read from DRAM together with the next 3 
contiguous 64-bits which are cached to avoid unnecessary DRAM reads. Writing zero to CmdSource 
causes the PCU to flush commands and terminate program access from DRAM for that command stream. 
The PCU requires a 256-bit buffer to the 4 PCU commands read by each 256-bit DRAM access. When the 
buffer is empty the PCU can request DRAM access again. Adding a 256-bit double buffer would allow the 
next set of 4 commands to be fetched from DRAM while the current commands are being executed. 

1024 commands of 64 bits requires 8 kB of DRAM storage. 

Programs stored in DRAM are referred to as PCU Program Code, 

21.8.4 End of band unit 

The state machine is responsible for watching the various input xxjinishedband signals, setting the Fin- 
ishedSoFar flags, and outputting the finished_alljband flags as specified by the BandSelect register. 

Each cycle, the end of band unit performs the following tasks: 

f inishedAllBand (FinishedSoFar [ 0] == BandSelectMasktO) ) AND 

(FinishedSoFarCl) BandSelectMaak{ 1} ) AKO 
(FinishedSoFar [2] BandSelectHaskC21) AND 

(BandSelectKaskCO] OR BandSelectHask( 1] OR BandSelect:Mask[2] ) 
i£ (fini3hedAllBand == 1) then 

FinishedSoFar [0] ^ 0 

FinishedSoFarCl] = 0 

FinishedSoFar (2] = 0 
else 

FinishedSoFar CO] « FinishedSoFar CO] OR <ll>dLf inishedlaand AND BandSelecCMaskCO] ) 
FinishedSoFarCl] = FinishedSoFarCl] OR <cdu_f inishedband AND BandSelectMaskCl ] ) 
FinishedSoFar C 2 ] » FinishedSoFar C2] OR (te.f inishedband AND BandSelectMaskC2] ) 

Note that it is the responsibility of the microcode at the start of printing a page to ensure that all 3 Fin- 
ishedSoFar bits are cleared. It is not necessary to clear them between bands since this happens automati- 
cally. 
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21.8.5 Executing commands from DRAM 

Registers in PEP can be programmed by means of simple 64-bit commands fetched from DRAM. The for- 
mat of the conmuinds is given in Table 93. Register locatiotis can have a data value of up to 32 bits. Com- 
mands are PEP register write conunands only. 



Table 93. Register write commands in PEP 













Register write 


data 


zero 


32-bit word 
address 


zero 



Due attention must be paid to the endianness of the processor. The LEON processor is a big-endian pro- 
cessor (bit 7 is the most significant bit). 

21.8.6 General Operation 

Upon a.Reset condition, CmdSource is cleared (to 0), which means that all conmiands are initially sourced 
only from the CPU bus interface. Registers and can then be written to or read from one location at a time 
via the CPU bus interface. 

If CmdSource is 1, commands are sourced from the DRAM at CmdAdr or from the CPU bus. Writing an 
address to CmdAdr automatically sets CmdSource to 1, and causes a command stream to be retrieved from 
DRAM. The PCU will execute conmiands from the CPU or from the DRAM command stream, giving 
higher priority to the CPU always. 

Regardless of the state of CmdSource the DRAM requestor is examines the Cmd Pending hits to determine 
if a new DRAM command stream is pending. If any of CmdPending bits are set, then the appropriate Next- 
BandCmdAdr is copied to CmdAdr (causing CmdSource to get set to 1) and a new command DRAM 
stream is retrieved from DRAM and executed by the PCU. Note that a new DRAM command stream only 
gets retrieved when the current command stream is empty. 

If there are no DRAM commands pending, and no CPU commands the PCU defaults to an idle state. 
When idle the PCU address bus defaults to the DebugSelect register value (bits 1 1 to 2 in particular) and 
the default unit PCU data bus is reflected to the CPU data bus. The default unit is determined by the 
DebugSelect register bits 15 to 12. 

In conjunction with this, upon receipt of a finis hedB and fnj signal, NextBandCmdEnable[n] is copied to 
CmdPehdingfnJ and NextBandCmdEnablefnJ is cleared. Note, each of the LBD, CDU, and TE (where 
present) may be re-programmed individually between bands by appropriately setting NextBandCmdAdr[2- 
0] respectively. However, execution of inter-band conmiands may be postponed until all blocks specified 
in the BandSelectMask register have pulsed their finishedband signal. This may be accomplished by only 
setting NextBandCmdAdr[3] (indirectly causing NextBandCmdEnable[3} to be set) in which case it is the 
finishedAllBand signal which causes NextBandCmdEnable[3] to be copied to CmdPending f3J . 

To conveniently xxpdate multiple registers, for example at the start of printing a page, a series of Write Reg- 
ister commands can be stored in DRAM. When the start address of the first Write Register command is 
written to the CmdAdr register (via the CPU), the CmdSource register is automatically set to 1 to actually 
start the execution at CmdAdr, 

The final instruction in the command block stored in DRAM must be a register write of 0 to CmdSource so 
that no more commands are read from DRAM. Subsequent commands will come from pending programs 
or. can-be sent via the CPU bus interface. 
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21.8.6.1 Debug Mode 



Debug mode is implemented by reusing the noranal CPU and DRAM access decode logic. When in the 
Arbitrate state (see state machine A below), the PEP address bus is defaulted to the value in the DebugSe- 
lect register. The top bits of the DebugSelect register are used to decode a select to a PEP unit and the 
remaining bits are reflected on the PEP address bus. The selected units read data bus is reflected on the 
pcujcpu^data bus to the RDU in the CPU. The pcujcpujdebugjvaiid signal indicates to the RDU that the 
data on 5ie pcujspujata bus is valid debug data. The pcu_cpu^debug_ya!id is a repeated version of the 
selected imits ready signal <unit>j)cu_rdy. 

Normal CPU and DRAM command access will require the PEP bus, and as such will cause the debug data 
to be mvalid during the access, this is indicated to the RDU by st\X\n% pcu_cpujiebug_valid to zero. 

The decode logic is : 

// Default Debug decode 

pcu_<unit>_sel - decode (DebugSelect (15 : 12 ) ) 

pcu_adr [11:21 « DebugSelect [11 : 21 

pcu^cpu^ddta = <unit>_pcu_datain[31 : 0] 

pcu_cpu_debug_val id = <unit>_pcu_rdy AND state == Arbitrate 



DRAM command fetching and general command execution is accomplished using two state machines. 
State machine A evaluates whether a CPU or DRAM command is being executed, and proceeds to execute 
the command(s). Since the CPU has priority over the DRAM it is permitted to interrupt the execution of a 
stream of DRAM commands. 

Machine B decides which address should be used for DRAM access, fetches commands from DRAM and 
fills a command fifo which A executes. The reason for separating the two functions is to facilitate the exe- 
cution of CPU or Debug commands while state machine B is performing DRAM reads and filling the 
command fifo. In the case where state machine A is ready to execute commands (in its Arbitrate state) and 
it sees both a full DRAM command fifo and an active cpu^cu^sel then the DRAM commands are exe- 
cuted. 



21.8,7 



State Machines 
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2t.8« 7. 1 State Machine A: Arbitration and execution of commands 

State Machine A 



pcu softraset n««O QHpf«t n 




^RAMAccess ^ ( cpu Access )- 



pcu_cpu.rdy ■ i 

pcu.cpv.data - <unit> j)cu_dala 



cmd fiourcp^i AND 
cmti fifofcoMl count^^RgSgRVFD 
N«xtBandCmdAdrt3:01 -0 
cmd_rdD.(uO»0 
cnxlJikM) 
cmd.ftOuiC8»0 




AdrError ) Pcu.icu_invaiid.a<Jdf«»s-i 



Figure 97. Command Arbitration and execution 

The state-machine enters the Reset state when there is an active strobe on either the reset pin, prst_n, or the 
PCU's soft-reset register. All registers in the PCU are zeroed, unless otherwise specified, on the next rising 
clock edge. The PCU self-deasserts the soft reset in the pclk cycle after it has been asserted. 

The state changes from Reset to Arbitrate when prst_n = 1 and PCUjsqftreset = I . 

The state-machine waits in the Arbitrate state until it detects a request for CPU access to the PEP units 
(cpu _pcu_sel = 1 and cpu_acode = 11) or a request to execute DRAM commands CmdSource = 1 , and 
DRAM commands are available, CmdFifoFull=\. Note if {cpu_pcu_sel = 1 and cpujxcode != 11) the 
CPU is attempting an illegal access. The PCU ignores this command and strobes the cpujpcujyerr for one 
cycle. 

While in the Arbitrate state the machine assigns the DebugSelect register to the PCU unit decode logic and 
the remaining bits to the PEP address bus. When in this state the debug data returned from the selected 
PEP unit is reflected on the CPU bus (pcu_cpu_data bus) and the pcu_cpu_debug_valid=\. 

If a CPU access request is detected (cpu^cu^el = 1 and cpu^acode =11) then the machine proceeds 
to the CpuAccess state. In the CpuAccess state the cpu address is decoded and used to determine the PEP 
unit to select. The remaining address bits are passed through to the PEP address bus. The machine remains 
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in the CpuAccess state until a valid ready from the selected PEP unit is received. When received the 
machine returns to the arbitrate state, and the ready signal to the CPU is pulsed. 

// decode the logic 

pcu_<unit>_3el « decode (cpu_adr [ 15 : 12 J ) 
pcu_adr(ll:2) = cpu_adr I 11:2) 

If when decoding the cpu_fidr bus, the address selects a reserved address, the state machine proceeds to 
the AdrError state, and then back to the Arbitrate state. An address error interrupt will be generated 

If the state machine detects a request to execute DRAM commands {CmdSource = 1), it will wait in the 
Arbitrate state until commands have been loaded into the command FIFO from DRAM (all controlled by 
state machine B). When the DRAM commands are available {cmdjifojull = 1) the state machine will 
proceed to the DRAMAccess state. 

When in the DRAMAccess state the commands are executed from the cmdjifo. A command in the 
cmdjifo consists of 64-bits (or which the FIFO holds 4). The decoding of the 64-bits to commands is 
given in Table 93. For each command the decode is 

// command decode 

pcu_<unit>_sel = decode{ cmd_f ifo(cind_countHl5:12J ) 
pcu_adr C 11 : 2 ) = cind_f i f o ( cmd_count ) (11:2} 
pcu_dataout = cmd_f if o(cmd_countl ( 63 : 32 ] 

When the selected PEP unit returns a ready signal {<unit> _pcu_rdy=\) indicating the command has 
completed, the state machine will return to the Arbitrate State. If more conunands exists {cmd_count !=0) 
the transition will decrement the command count. 

When in the DRAMAccess state, if when decoding the DRAM command address bus 
(cmdJifo[cmd_count][l5:12]), the address selects a reserved address, the state machine proceeds to the 
AdrError state, and then back to the Arbitrate state. An address error interrupt will be generated and the 
DRAM command FIFOs will be cleared 

A CPU access can pre-empt any pending DRAM commands. After each command is completed the state 
machine returns to the Arbitrate state. If a CPU and DRAM command are pending the CPU command 
always takes priority. If a CPU or DRAM command sets the CmdSource to 0, all subsequent DRAM com- 
mands in the command FIFO are cleared. If the CPU sets the CmdSource to 0 the CmdPending and Next- 
BandCmdEnable work registers are also cleared. 
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21.8.7.2 State Machine B: Fetching DRAM commands 



State Machine B 



c 



1 



Reset 



PCU anftnwwLiiagJL 

ANDOf»t nmmi 



c 



Walt 



ANDemd i 



spurcQ*-O.ANDcmd fifo firtt-"Q 



p^ndin? ioO 

cmd.sou/ce ■ 1 
pcu_diu_froQ« \ 

cmd_adr - NflxtBandCmdAcfrtpanding] 
pcu_diu_fadr«cfnd_adr 



C 



> 



cmd somee-«.1 AND cfnd fite fij<r— 0 

pcu_diu.freq-i 

pcujdu.iadr-omd_adr 



FlilFlto 



Datal ^ 



Data2 ^ 



cmOjifo(2HjKi_data 



Q Data3^ 



rihi prti rvalirt— 1 

cm4}Jifo(3]*du^<lata 
cmd_fito_tuil-l 
cmd^counl » 3 



Figure 98. DRAM command access state machine 



A system reset (prst_n=^0) or a software reset {pcu_softreset_n^=Qf) will cause the state machine to reset 
to the Reset state. The state machine remains in the Reset until both reset conditions arc removed. When 
removed the machine proceeds to the Wait state. 

The state machine waits in the Wait state until it determines that commands are needed from DRAM. Two 
possible conditions exist that require DRAM access. Either the PCU is processing commands which must 
be fetched from DRAM {cmd_source=^\), and the command FIFO is empty {cmd Jifo _Jull-^), or the 
command FIFO is empty and there are some commands pending {cmd _pending !=0). In either of these 
conditions the machine proceeds to the FillFifo state and issues a read request to DRAM 
(j)cu_diujrreq=\\ it calculates the address to read from dependent on the transition condition. In the 
conmiand pending transition condition, the highest priority NextBandCmdAdr that is pending is used for 
the read address {pcu_diu_radr) and is also copied to the CmdAdr register. In the normal PCU processing 
transition the pcu^diu^radr is the CmdAdr register. 
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In the FillFifo state the machine wait for the DRAM to respond to the read request and transfer data words. 
On receipt of the first word of data diu_pcujrvaUd=\, the machine stores the 64-bit data word in the 
command FIFO {cmdjifo[0]) and transitions to the Datal, Data2, DataS states each time waiting for a 
diu^cu_rvalid=^l and storing the transferred data word to cmdjifoflj^ cmdJifo[2] and cmdjifop] 
respectively. 

When the transfer is complete the machine returns to the Wait state, setting the cmdjcount to 3 and the 
cmdjifojull^l. 

21.3.7.3 PCUJCU_AddressJnvaUd Interrupt 

When the PCU is executing commands, addresses decoded from commands which are not PCU mapped 
addresses (4-bits only) will result in the current command being ignored and the pcujnvalid_address 
interrupt signal is strobed. If this command is from DRAM all remaining commands already retrieved 
from DRAM are flushed from the CmdFifo, CmdPending, NextBandCmdEnable and CmdSource are 
cleared to zero.) These registers are unefFectcd if the command is from the CPU. The CPU can then inter- 
rogate the PCU to find the source of the illegal command via the InvalidAddress register. 
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22 Contone Decoder Unit (CDU) 



22.1 



Overview 



The Contone Decoder Unit (CDU) is responsible for performing the optional decompression of the con- 
tone data layer. 

The input to the CDU is up to 4 planes of compressed contone data in JPEG interleaved format. This will 
typically be 3 planes, representing a CMY contone image, or 4 planes representing a CMYK contone 
image. The CDU must support a page of A4 length (1 1.7 inches) and Letter width (8.5 inches) at a resolu- 
tion of 267 ppi in 4 colors and a print speed of I side per 2 seconds. 

The CDU and the other page expansion units support the notion of page banding. A compressed page is 
divided into one or more bands, with a number of bands stored in memory. As a band of the page is con- 
sumed for printing a new band can be downloaded. The new band may be for the current page or the next 
page. Band-finish interrupts have been provided to notify the CPU of flee buffer space. 

The compressed contone data is read from the on-chip DRAM. The output of the CDU is the decom- 
pressed contone data, separated into planes. The decompressed contone image is written to a circular 
buffer in DRAM with an expected minimum size of 12 lines and a configurable maximum. The decom- 
pressed contone image is subsequently read a line at a time by the CFU, optionally color converted, scaled 
up to 1600 ppi and then passed on to the HCU for the next stage in the printing pipeline. The CDU also 
outputs a cdujinishedband control flag indicating that the CDU has finished reading a band of com- 
pressed contone data in DRAM and that area of DRAM is now free. This flag is used by the PCU and is 
available as an interrupt to the CPU. 



22.2 Storage requirements for decompressed contone data in DRAM 



A single SoPEC must support a page of A4 length (1 1.7 inches) and Letter width (8.5 inches) at a resolu- 
tion of 267 ppi in 4 colors and a print speed of 1 side per 2 seconds. The printheads specified in the Bi- 
lithic Printhead Specification [2] have 13824 nozzles per color to provide full bleed printing for A4 and 
Letter. At 267 ppi, there are 2304 contone pixels* per line represented by 288 JPEG blocks per color. How- 
ever each of these blocks actually stores data for 8 lines, since a single JPEG block is 8 x 8 pixels. The 
CDU produces contone data for 8 lines in parallel, while the HCU processes data linearly across a line on 
a line by line basis. The contone data is decoded only once and then buffered in DRAM. This means we 
require two sets of 8 buffer-lines - one set of 8 buffer lines is being consumed by the CFU while the other 
set of 8 buffer lines is being generated by the CDU. 

The buffer requirement can be reduced by using a 1.5 buffering scheme, where the CDU fills 8 lines while 
the CFU consumes 4 lines. The buffer space required is a minimum of 12 line stores per color, for a total 
space of 108 KBytes^. A circular buffer scheme is employed whereby the CDU may only begin to write a 
line of JPEG blocks (equals 8 lines of contone data) when there are 8-Unes free in the buffer. Once the full 
8 lines have been written by the CDU, the CFU may now begin to read them on a line by line basis. 

This reduction in buffering comes with the cost of an increased peak bandwidth requirement for the CDU 
write access to DRAM. The CDU must be able to write the decompressed contone at tvace the rate at 
which the CFU reads the data. To allow for trade-offs to be made between peak bandwidth and amount of 
storage, the size of the circular buffer is configurable. For example, if the circular buffer is configured to be 
16 lines it behaves like a double-buffer scheme where the peak bandwidth requirements of the CDU and 



1. Pixels may be 8, 16, 24 or 32 bits depending on the number of color planes (8-bits per color) 

2. 12 lines x 4 colors x 2304 bytes (assumes 267 ppi, 4 color, full bleed A4/Lettcr) 
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CFU are equal. An increase over 16 Unes aUows the CDU to write ahead of the CFU and provides it with a 
maigin to cope with very poor local compression ratios in the image. ' 

SoPEC should also provide support for A3 printing and printing at resolutions above 267 ppi This 
increases the storage requirement for the decompressed contone data (buffer) in DRAM. Table 94 «ves 
the storage requirements for the decompressed contone data at some sample contone resolutions for dK 
entpagesi2es.Itassumes4colorplanesofcontonedataanda 1.5 buffering scheme "s'"™™' 



J?!'!^ f 1' f requtrements for decompressed contone data (buffer) 




a. Required for CFU to convert to final output at 1 600 dpi 

b. Bi-lithic printhead has 1 3824 nozzles per color providing full bleed printing for A4/Utter 

c. Bi-lithic printhead has 19488 nozzles per color providing full bleed printing for A3 

d. 1 2 lines X 4 colors x 2304 bytes. 

22.3 Decompression performance requirements 

The JPEG decoder core can produce a single color pixel every system clock (pclk) cycle, making it capa- 
ble of decoding at a peak output rate of 8 bits/cycle. SoPEC processes 1 dot (bi-level in 6 colors) per sys- 
tem clock cycle to achieve a print speed of 1 side per 2 seconds for fyi bleed A4/Letter printing. The CFU 
v2Sr f . ,?n?' (^!>°r!>^' °f ^ both the horieontal and vertic/directt" ns to con- 
tT ^ u° ""u ^ '^^^'^ P'^'^ (32 bits) eveiy SF x SF cycles. 

Tl^e 1.5 buffermg scheme descnbed m section 22.2 on page 265 means that the CDU must write the data at 

S blte/c^Te' '^^^P'^'^'^^ o"n>ut bandwidth requirement is 

The JPEG decoder is fed directly from the main memory via the DRAM interface. Tlie amount of com- 
pression detenmnes the input bandwidth requirements for the CDU. As the level of compression increases 
the bandwidth decreases, but die quality of the final output image can also decrease. Although the average 
compression ratio for contone data is expected to be 10:1. the average bandwidth allocated to the CDU 
allows for a local nainimum compression ratio of 5:1 over a single line of JPEG blocks. This equates to a 
peak input bandwidth requirement of 0.36 bits/cycle for 4 colors at 267 ppi. full bleed A4/Letter priming 
at I Side per 2 seconds. ^ ^ 

t^il^r-^^'^-'l ^ ^^."^P^^^^^^^^ bandwidth requirements for different resoluHons of contone data 

to meet a pnnt speed of 1 side per 2 seconds. Higher resolution requires higher bandwidth and larger stor- 
hf^c^' f ,^^"^P^«f contone data in DRAM. A resolution of 400 ppi contone data in 4 colors requires 4 
bits/cycle which is practical using a 1.5 buffering scheme. However, a resolution of 800 ppi would 
require a double buffenng scheme (16 lines) so the CDU only has to match the CFU consumption rZ in 



I.. 2 X ( (4 colors x 8 bits) / (6 x 6 cycles) ) - 1 .78 bits/cycle 
2. 2 X { (4 colors x 8 bits) / (4 x 4 cycles) ) « 4 bits/cycle 
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b. Scale factor 2 requires at least a 16 line buffer. 



22.4 Data flow 
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Rgur« 99. Outline of contone data flow with respect to CDU 

and K. directly represented by CMVX nl^ ^ 7 , ^'^^P'^. the four colors may be C. M. Y. 
muIti-SoPEC printing with exLrcol^. ^ ^""^ '""^ '^P'*^*"^ 8°^' ""^tallic green etc. for 

contain luminance infonnation 2 s^Id „^d^o ^' ^' ^ 

We therefore provide the means bTwWch Cl^^ be nLT.Tf Tln;^'* 'PP^oP^ate luminance tables, 
conversion. When being JPEG comTZleM^^^^ ^ <=o'or 

finally JPEG compressll. At decompSl t^^^^^^ '° '^^"^ '° ^^^rCb and then 

contone store by the CDU. This is re^^J'^ke JS^he^ThJ YC^' ' '° '^^ decompressed 

verted to RGB, and finally back to CMY. ^ optionally color con- 




SoPEC : Hardware Design 



The external RIP provides conversion from RGB to YCrCb. specifically to match the actual haidwaie 
miplcmentetion of the inverse transfonn within SoPEC. as per CCIR 601-2 (20] except t^Tcr^JS 
are noimaiized to occupy all 256 levels of an 8-bit binary encoding. ««« Y. and Cb 

^^cE^^'iS '° either RGB or CMY. RGB is included since it is a necessary step to 

produce CMY, and some pnnteis mcrease their color gamut by inchiding RGB inks as well as CMYIO 
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22.5 Implementation 

A block diagram of the CDU is shown in Figure 100. 



compressed 
contone 
FIFO 
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2^ 


ftf en. wr en 






fifo_wr_aclr 


4- 








."8 



/pg_fn_strb 



JPEG 
decoder 



jpg_dec_staius 



P*xeLout_valfd 



IptxoLout 











8 y 










Store 


o 




tandstor 




CO 


Si 


5 


^1 

o 




s 


s' 

01 







configLfration 
registers 



Contone Decoder Unit 



'6 /r32 



num_^bufff enas 



i 



contone 
fine 
store 
interface 



hatf-block 

buffer 
interface 



32 



PEP Controller Unrt 



Cfock. Power 
Reset 



Contone Rfo Unit 



Figure 100. Block diagram of CDU 
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All output signals from the CDU (crfM wraA»*//«e cj„ fi„/^^^ v • • 

signals to the DIU) must alw^ ^.f^r^^ VS^r^^hf 

The read control unit is responsible for keeping the JPEG decoder's innut fifo a ii u 



22-5.1 Definitions of I/O 



Tabte 96. CPU por t list and description 

^^^^ 



jclk_enable 



/rst_n 



PCU Interface 



pcu_cdu_sel 



pcu_fwn 



pcu_adr[7:2] 



pcu„dataoutI31:0j 



cdu_jX5u_rdy 



cdu^u^data{31:0] 



DIU read Interface 



cdu_dru_rfeq 



diu_cdu_rack 



cdu,diu_radft21:5] 



diu_cdu_rvalid 



«lu_diu_wreq 



diu_cdu_wack 
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In 



In 



Out 



Gated version of system dock used to dock the JPEG decoder 
core and fogic at the output of the core. Allows for stalling of the 
JPEG core at a pixel sample boundary. 



. System reset, synchronous a ctive low. 
In I Reset for jdk domain, synchronous active tow. 



32 



32 



In 



In 



In 



In 



Out 



Out 



select Tfom the PCU. When pcu_cdu^Berm high both 
pcu^adrand pcu_dataout are valid. 



Common read/not-write signal from the PCU. 



PCU address bus. Only 6 bits are required to decode the 
address space for this Wock. 



Shared write data bus from the PCU. 



Ready signal to the PCU. When a^u_pcLLrtfy is high It indicated 
the last cyde of the access. For a write cyde this means 
pcu^da^out has been registered by the block and for a read 
cycle this means the data on cdu ^>cu_data is valid. 



Read data bus to the PCU. 



Out 



In 



17 



Out 



In 



CDU read request, active high. A read request must be aocom- 
panied by a valid read address. 



Acknowledge from DIU. active high. Indicates that a read 
request has been accepted and the new read address can be 
placed on the address bus, cc/t/ diu radn 



CPU read address. 1 7 bits wide (256-bit aligned word). 
Read data valW. active high. Indfcates that valid read data is ' 
now on the read data bus, diu_data. 




In 



CDU write request, active high. A write request must be accom- 
panied by a valid write address and valid write data. 



Acknowledge from DIU. acUve high. Indicates that a write 
request has been accepted and the new write address can be 
placed on the address bus. cdu_diu_wadr. 
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Table 96. CPU port list and description 



mm 




cdu.diu,wadr(21 :3] 
odu_dlu_wvaJkl 



Out 



CPU write addre ss. 19 bits wide (64^^^^^ ^.^ 



Write oaia valid, active high. Indicates that vaUd data is now on 
the wnte data bus , cdu^diu^data. 
Write data bus. 




cdij,cfu_wradv81lne 



TEand LBP Interfece 



Out 



^^11 ««2f high. Indicates that the CDU has fin^ " 

ished writing to 8 lines of decompressed contone dato to ih« 
ojrar bofl^rin PRAM and thT^a^ ,3 avlK be ^^ad ^TtS' 



cdu_start_of_bandstore[21 :5] 
cdu_end_of_bandstore(21 rS] 



ICU Interface 



17 



17 



Out 



Points to tne 256^>it word that defines the start of the meitiorv 
area allocated for page bands. "memory 



FNDints 10 tne zse-bit word that defines the last address of the 
memory area allocated for page bands. 



1 cdu.finishedband 


1 


Out 


r .^2^! H^shedBandttig, active high. Interrupt to the CPU to ~ 
1 indicate that the CDU has finished processing a b^d ot com- 
pressed contone data m DRAM and that area of DRAM is^w 
free.Th,s signal goes to both the interrupt contrdler^d me 


odu_lcujpegerror 


1 


Out 


^e highmterrupt indicating an error has occurred in the ~ 
rr«fof r*^"r? decompression has stopped. A 
reset of the CDU must be performed lo ctear this interoipt 
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22.5.2 Configuration registers 

" ™" *" ""P"^ via fc PCU iwoftc. R.fe, lo secio. 21 8 2 «, 



Table 97. Registe rs shared between the CDU, LBD and TE 



Setup registers 
0x80 


(remain constant du 
StartOfBandStore 


ring the 
17 


processing o 
0x0,0000 


f mmttple bands) " 

Points to the 256-Wt word that defines the start of the 
menwry area altocated for page bands. 
Circular address generation wraps to this start 
address. 


0x84 


EndOfBandStore 


17 


OxI^FFFF 


Pornts to the 256-bit word that defines the last 
address of the memory area allocated for page 
bands. 

If the current read address is from this address, then 
Instead of adding 1 to the current address, the cur- 
rent address will be loaded from the StartOfBand- 
Store register. 



Table 98, COU registers 



Control registe 
0x00 


rs 

Reset 


1 


0x1 


A write to this register causes a reset of the CDU. 
This terminates all internal operations within the 
CS6150. Ail configuration data previously loaded into 
the core except for the tables is deleted. 


0x04 


Go 


1 


QxO 


Writing 1 to this register starts the CDU. Writing 0 to 
this register halts the CDU. 
When Go is deasserted the state-machines go to 
their idle states but all counters and configuration reg- 
isters keep their values. 

When Go is asserted ail counters are reset, but con- 
figuration registers keep their values (i.e. they don't 
get reset). NextBandEnabte is cleared when Go is 
asserted. 


Setup registers 








The CFU must be started before the COU is started. 
This register can be read to determine if the CDU is 
oinning {1 - running. 0 - stopped) 


OxIO 


MaxPlane 


2 


0x0 


Defines the number of contone planes - 1. 

For example, this will be 0 for K (greyscale p/intino) 2 

for CMY, and 3 for CMYK. 
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0x14 j MaxBJock 

0x18 TBuffStartAdT^ 



0x1 C 



BuffEndAdr 



0x20 



NumBuffLines 



15 



0x24 



BypassJpg 



0x30 



NextSa/idCurr- 
SourceAdr 



0x34 



NextBandEnd- 
SourceAdr 



0x3C 



NextBandValid- 
BytesLastFetch 



0x0000 



17 



QxOC 



0x0 



Points to the start of the decompressed contone dr-" 
A hall JPEG iXock consists of 4 words of 256^ 



FWnts to tne start of the last haJf JPEG block at the 
n^AiL I "^^"^^ contono circular buffer In 
A half JPEG block consists of 4 words of 256-blt8 

TIpIg MoT ^ 



19 



OxO_0000 



OxO_0000 



Defines size of buffer in ORAM In terms of the 

"'^"'P^ssed contone lines. The size of 
the buffer should be a multiple of 4 lines with a mini- 
mum size of 8 l ines. 

^IT^^f whether ornot the JPEG decoder will be ' 
bypassed (and hence pixels are copied directly from 
input to output) ' 
0 - don't bypass. 1 - bypass 
Should not be changed between bands. 



The 2«j-oit awgned word address containing the start" 
Onm *" "'"•one data in 

TTifs >«lue is copied to CoffSou/ee/itfrwhen both 
Ooneea«/is 1 and N^ruiEn^eis 1. or when 
GO transitions from 0 to 1 . 



NextBandEnaWe 



0x00 



0x0 



The w-dit aligned word address contajning the last 
b^of the next band of compressed contone data in 

This value is copied to En^SourceAdrv^en when 
both DoneBand is 1 and NextBandEnaUe is 1 or 
when Go transitions from 0 to 1 . 



Mask containing a 1 in each bit position that repre- 
sents a valid byte in the fast 64-bH fetch of the next 
band of compressed contone data from DRAM 
Th^ ^^!ue is copied to V'a/«/flyfestas//=e/c^ when 
both DoneBandis t and NextSandEnabto is ^ or 
when Go transitions from 0 to 1 . 



When /VaxfBancfFnaWeisi and OoneSa/j^y is 1 then 
when cdu_ffnishedband\s set at the end of a band 
:NBXtBandCunrSourceAdr\% copied to Curr- 
SourceAdr, 

'NMBaridEridSourceAdr 'is copied to SndSouiveAdr 
•NextBandVaiidBytesLastFetch is copied to Valid- 
BytesLastFetch 

DoneBand is deared, 

NextBand£r7abi0 is cJeared. 
NextBandEnabfa is cleared when Go is asserted. 
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Table 98. COU registers 




0x44 



CurrSourceAdr 



0x48 



0x4C 



EndSourceAdr 



17 



19 



0x0 



OxO_0000 



VaiidBytesLast- 
Fetch 



JPEG decoder core s etup registers 

0x50 



0x0^0000 



0x00 



Specifies whether or not the current band has fin- 
ished roading mto the local FIFO. It fs cleared to 0 
when Go transrtions from 0 to 1. 
When the fast of the compressed contone data tbr the 
band has been loaded into the local FIFO, the 
cdu^finfshBdband signal is given out and me 
Ooneeandflagisset. 

If NextBandEnabte is 1 at this time then Curr- 
SourceAdr, SndSourceAcfran^S VafidBytesLastFetch 
are updated with the values tor the next band and 
DoneBarKi is cleared. Processing of the next band 
starts rnvnediately. 

liNaxtBandBnab/e is 0 then the remainder of the 
CDU wfll continue to run. decompressing the data 
already loaded, while the read control unit waits for 
NextBandEnableXo be set before it restarts 



cun^ent Z56-bit aligned word address within the 
current band of compressed contone data in ORAM. 



The 64-brt aligned word address containing the last 
bytes of the current band of compressed contone 
data in DRAM. 



Masic containing a 1 in each bit position that repre- 
sents a valid byte in the last 64.bit fetch of the current 
band of compressed contone data from DRAM If the 
lower 3 bytes are valid, then the lower 3 bits of Valid- 
BytesLastFetch should be set and the upper 5 bits 
should be clear. 



JpgDecMask 



0x54 



JpgDecTType 



0x58 



0x5C 



JpgOecTestEn 



JpgDecPType 



0x00 



0x0 



0x0 
0x0 



JPEG decoder core read-only stat us registers 
I JpgDecHdr 



As segments are decoded they can also be output on 
the DecJpg {JpgOecHdi) port with the user selecting 
the segments for output by setting bits in the JpgDec- 
Mask port as follows: 
4 SOF+SOS+DNL 
3 COM+APP 
2DRI 
1 DOT 
ODHT 

If any one of the bits of jpgOecMask is asserted then 
the SOi and EOl markers are aJso passed to the 
DecJpg p ort, ^ 



Test type selector: 

0 • DOT coefficients displayed on JpgDecTdata 

1 - QDCT coefficient displayed on JpgPecTdata 



Signal which causes the menrwries to be bypassed 
for test purposes. *^pa53ea 



Signal specifying parameters to be placed on oort 
JpgPecPValue (See Table 99), 



0x00 



Selected header segments from the JPEG stream 
that is currently being decoded. Segments selected 
iJsmg JpgMastc 
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22.S.3 



0x64 
0x66 
















put bjrte of the first 8x8 btoek of the test data. 

tlTs^ hL"^- "'^ 

pux oyie or each 8x8 block of test data. 
11-0- 1 1-bit output test data port - displays OCT 
coefffctents or quantized coefto'ents depending on 
^\ue of JpgDecTType. ^ 


0x8C 


JpgOecPVatue 


16 


0x0000 


Decoding parameter bus which enables various 
parameters used by the core to be read. The data 
av^lable on the PValue port Is for information only 
a^d^doea not contain control signals for the decoder 




JpgOecStatus 


22 


0X00.0000 


Bit 21 -JpgLcor0_statI{ii set, indicates that the JPEG 
^"11. i^"®^ ^ ^^"0 of jdk as the output JPEG 
halfelock double-buffers of the CDU are full) 
Zl^ ^'P^-O*'^-^^^^ (This signal is an output from 
me JPEG decoder core and Is asserted when a pixel 
Is being output ^ 

Bits 19-16 - m^contents (FIFO at inpijt of JPEG 
decoder core) 

CfiRi^ tiV'"" i'^P ^^'"^ "^'^'^ the 
CS6150 (see TaWe 100 for deserintio« «♦ k;»„^ 





The CDU should only be started after the CFU has been started 



Aims. Users then set the CDU's Co bit t^^art^ocSif ^^^Tt '^^^^ ""'^ ^"-^"Z- 

for the band has finished being read in The SS^CL ] . ' ^^1^' <^o'npressed contone data 
indicating that the memory associated wiA t^f^r^it^T "'i'"^' ^" ^ ^° ^^U and CPU 
band of contone data '^''^ ''''' ''^'^ ^^e- Pr<><^sing can now start on the ne 



i^^l^Ll^^nTedt^b^^^^^^ 

for restarting the CDU betweenbiS: * '° NextBandEnable. There are 4 mechanisms 

set its £,.neWbit. Tl^e 

.>.er^r.s^giste;'ri^":^fz^^^^ 

rent band. At thet^ijf fhr^^ f.K *e A^exr5«„d£„aAte bit before the end of the cur- 
already I . the CDU X^r=X^t t?.^^^^ 

BandValidBytesLastFetch register ^ds^Z^^^B^p^f^^ 

the next ba^d. The advantag^f Srs^hemt ."S^U^^^^ Ir^'tL^h" ^'^^^ 

advance and store the band commands in DRAM ready for ex^'JJo^^^^ '"^^ ''^^'^ 
d. This is a combination of Z> and c above The Prrr rroti,-, .1. 
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registers and sets the NextBandEnabh bit before the end of the current band. At the end of the 
current band the CDU sets DoneBand and pulses cdujinishedband. As NME^ble^ 
akeady 1 the CDU starts processing the next band immediately. Simultaneously 
cdujinishedband triggers the PCU to fetch commands from DRAM. The CDU will have 

rnt? ""'.t' ^^V T ^'''^'^ ^'^'^ fr^'" PCU comJumS pro. 

gram the CDU's next band shadow registers and sets the NextBandEnable bit. 

If an error occurs in the JPEG stream, the JPEG decoder wUl suspend its operation, an error bit will be set 

mg again. An mtenrupt is sent to the CPU by asserting cdujcujpegem,r and the CDU should thSti 
reset by means of a wnte to its Reset n^ster before a new page can be printed. 

22.5.4 Read control unit 

The read control luut is responsible for reading the compressed contonc data and passing it to the JPEG 
decoder ^aa the FIFO. The compressed contone data is read from DRAM in siSgle Se-bit acceLef 
receivmg the data from the DIU over 4 clock cycles (64-bits per cycle). TT^e protocol and tinS^g forlSd 
accesses to DRAM .s des^bed in section 20.9. 1 on page 208. Read accesses to DRAM arSeme^J 
by means of the state machine described in Figure 101. unpiemeniea 

AH counters and flags should be cleared after reset. When Go transitions from 0 to 1 all counters and flags 
should take theu mit:al value. While the Go bit is set. the state machine relies on the Do«SS to td 
rtwhetherto attempt to readaband of compressed contonc data. When Donfi^anrf is set!^^^^^^ 
Sr^^t"^.^" "^''^t-?'' I ''^^ *^ "^'^ "^*=""^ t° '-d data into the' JJE^tput HFO 

taowl2f:^t T "P"'" ^ •^'^O- Note that the state machine has no 

knowledge about numbers of blocks or numbers of color planes - it merely keeps the JPEG input FIFO fUU 
by consecutive reads from DRAM. The DIU is responsible for ensuring that DRAM requests are satisfed 
at least at the peak DRAM read bandwidth of 0J6 bits/cycle (see section 22.3 on page 26$ 
A modulo 4 counter, rd.count is use to count each of the 64-bits received in a 256-bit read access It is 
mcrementcd whenever diu_cdu n>aUd is asserted. As each 64-bit value is returned. Indicated by 
ey^7:i1o^'''''' « --P-i to both end_source.adr anJ 

* vii?!'^-fr"'^-'f''^-''°Tl end_source_adr, the end_of_band control signal sent to the 

FIFO .si (to si^ly tbe endof the band). they?.i.WCOC/W signal is output, and xT. DoneBand^t 

L FIFO.' ''"^ '8"°^' ^««t^ 

. If rrf.counr equals 3 and {curr_source_adr.rd^caunt} does not equal enrf arfr. then 

wlir^.'T'- " *° -'-r_o/_6..^,o.e or cu^_.ou;.e_.<f. ^ 1. depend ng on 

TOO Jo ''"^ "'^ «'«'/-«/_*«/u&rore. The control signal sent to the 

curr_source_adr is output to the DIU as cdu_diu_radr. 

A count is kept of the number of 64-bit values in the FIFO. When diu_cdu rvalid is 1 and ignore data is 
LSSentr"'" '^•^"^ -d.A/o_Jr^^^^^ 

Tn/o J^Xlp^r' H "J^'' ^•^^V::-^'"* " '° <^ available in 

th StV^ S f""**',^""*- The JPEG decoder core asserts,/>g_,>,_rrfK when it is ready to receive 
^^nT T " " ''^'T''^^' '° •'yP^s the JPEG decoder core by setting th^BypassJpg reg- 

.Sterto 1. In this case data .s sent irectly from the FIFO to the half-block double-buffer, ^le the fpfi!} 
fh^ Opg core_sraU^u^ 0). andy>g_.>._r^ (or bypass Jpg) ^^Ajpgjn strb are both 1^ 

^^K*^ "^-^^ decode, core.yi/o_.rf_«/rA-o/ is then incremented to select the 
next byte. The read address is byte aligned, i.e. the upper 3 bits are input as the read address for the FIFO 
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ResatftRpnii n — ft 



cdu.dlu.rreq i 
Ignora^data « 




cdu_diu_rfeq ■= 0 
ignore Jdata « 0 



odu diu IT AO » 1 



odu_diu_rreq » i 
rgnore.data » 0 



ack 



fCVjrr SQlirCff fldrrri ftQ^^nt}.' 



end sa»fr<» 



odu.diu.rreq b o 
fgnore.data « 1 



dm Cdii r:^pf^ ^ I 



cdu_diu_rreq « 0 
ignore^data m 0 



read 



Figure 101. State machine to read compressed contone data 



22.5.5 Compressed contone FIFO 



wi.S«-S 'J^s^l^btTJy (''^ accommodate t^vo 256-bit accesses), 

ten to the t^F^^Sthw e^Tn^o^Ta '-'''''f-<>-^-'-f.^'i '^--^-^64.bit^^^^ 
endMand bit is Hf aIs is theTast ia i^^^fl 1 ' ""T* " ^""^ read control unit. The 
When enrf of band - \S^r^Z^ ^vA°' ^ T"".' ° " «he last transfer, 

sion of the sa^e. * ' " """"O^^^'^^^cA register is also copied to an image ver- 
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»» d« mus, be .no^ U». 4 x ^S!.^ 52 "SS"^ '™"*^ » «». .„b b«d ,f c». 
22.5.6 CS6150 JPEG decoder 

the CS6150 JPEG decoder core can run at 185 mh,;^ A u ^ ^^P^'O" l^ave stated that 
which a gated version of L^S^^Tock i!fk G^. *l T f °Sy). The core is clocked byyW* 
JPEG decoder on a single co J^-bSxd basSTon^^ of the ^ ' ^^""^ 
the PixOutEnab input to the JPEG deS Howevfr ^^^^^^^ °^ °u'^''' P^^^'*^ by 

block boundary and is insuffiJSt fJSSr^ ' , ^J^^. ^'r' "'^''"^ °^ ^ « "'^^ 

instead tied insufficient for SoPEC. THus gating of the clock is employed and PUOutEnab is 

I 

The CS6150 decoder automatically extracts all relevant parameters from the JPFCx hvt.ct,», 

them to contro the decoding of the imaee The rPPr . bytestream and uses 

quantization tables, restart i 'erval Sn^nd frle a^^^^^^^ 

the JPEGbytestream automatically detecting and p^S^ STthelfEG ItS-T^" "T'.f^ '.'"^^"^ 
fying the JPEG segments the decoder re-dir«:ts thll^l^i segments. After identi- 

version Of the CS6150).itmustbee,ualto\etoS?gVSgti"^^^^^^^^ 

f^r^^^T^'^'" '^-ing diagrams of the 

length as this is a moiifiLon to Ae^o J ' '° """^es of more than 64k lines 

pixels m the correct color order. TTie data is uncompressed L is therefoS lo^Te^. °' 
The following subsections describe the means by which the CS6150 internals can be made visible. 
2Z5. 6. 1 JPEG decoder parameter bus 



Table 99. Parameter bus definitions 




YMCU: number of MCUs in Y directton of (he cun-enl scan 
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Table 99. Parameter bus definitions 




0x4 



0x5 



00,XMCU[13:Q] 



CsO(7:OLTqOIl :0LV0(2:0] 
^H0I2:0J 



0x6 



0x7 



0x8 



0x9 
OxA 



Cs1f7:OLTq1[1:OLVlI2:0] 
_H1(2:0J 



Cs2r7:0LTq2[1:0LV2C2:OJ 
^H2P:0J 



Cs3I7:0LTq3[1 :0LV3f2:0] 
-.H3[2:0J 



CsHt15:0] 



0x8 



CsV[1S:0] 



DRI[15:0J 



000_HMAX[2:0LVMAX[2: 
OL MCUBLKI3:0LNS(2:0J 



I 



XMCU: number of MCUsln X direiafon ^TIIZHE 



CsO: identifier for the first scan component 
TgO: quantfeation table identifier ibr the first scan oompo- 

iSue^Lr""^"^ ^'^'^ ^ ^ ^ component 



Csl. Tql , VI and HI for the second scan component. 
VI. HI undefined if NS<2 



C82. Tq2. V2 and H2 «Dr the second scan component. 
V2, H2 undefined if N S<3 

Cs3. Tq3. V3 and H3 for the second scan component " 
V3. H3 undefined if NS<4 



CsH: no. of rows in current scan 



CsV: no. of columns in current scan 



ORI. restart interval 



Hf^: maximal horizontal sampling factor in frame 

^"^Plins factor in fiame 
fTom 1^"" ' ^ of the current scan. 
NS: number of scan components in cuffant scan. 1-4 



22.5.6.2 JPEG decoder status register 

The status register flags indicate the cuiTcnt state of the r«!/;i«tf»^^ ^ Mn. 

ing the decoding process, the decompr^^n^ocesst the JPEG "".T' 

sem to the CPU by asserting cdu icu inZ.^ZT '^^u . ^'^^ ^^P^nded and an intcmipt is 

the JpsDecStatus register. The CS6150 ^ts^tii I^^T '^r^""'' ^^^^ '^^'^S 

prs,_n or by a soft of the CDU. ^7^^^ l^olZoTJ T''"'' ''''' 
high to indicate an «ror condition as dcfin^^b iSleToo '° ^ ""^ ^'^^^ 

IS required from the user. If any of the other errorc Jl. ,u T . ^° "° inteivention 



Table ICQ, JPEG decoder status register definitions 




11-8 



.TblDefr7:4J 



TblDef[3:0J 



OecHfError 



Indicates the number of Huffman tabfes defined, IbftAabie . 
Indicates the number of quantization tables defined, Ibit/table. 



Set when an undefined Huffman table symbol is referenced durino decodinn 
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Table 100. JPEG decoder status register definitions 



6 
5 


HtError 


b« wnen an .nvaiw 50F parameter or an invalkJ SOsTaarTwter is det^ted^ 
Also set wtien there is a mismatch between the DNC segment input to the core 
Noto thai SoPECIs mplemenfation oftho CSOISO does not rLuir^ a thalDNL 

set when an invalid OHT segment is detected ' 


4 

3 


QtError 
DecError 


Set when an Invalid DOT segment is delected — 

Set when anything other than a JPEG marker is input 
Set when any of DecFlags(6:4J are set 

Set when any data other than the SOI marker is detected at the start of a stream 
mtr^u^rarnTeSl'nl^^^^^^ 


2 


IDctlnProg 


p;^r:;.'s:;^^^T3^^^ra^"^^^^ - 


1 


OednProg 


T-^^^f? *f " "^'^ asserted after the SigSOS (Start of Scan SeomenH 
agnal has been output from the core and is de-assertei when thV^S™ of a 
scan ,s complete. It indicates that the core is in the decoding slate 


0 


JpglnPfog 


Set when core starts to process input data (Jpgin) and de-asserted when decod- " 
.ng has been completed i.e. whenthe last pixel of last block of the imS"so!^. 



22.5.7 Half-block buffer interface 



to stall the JPEG decoder core at^ ou^ut o^^^ "T''' *° ^^'^ 

pixel). We provide a mechanism for stXg L J^EC dccod«^^^^^^ ' T 'I "'^f 

ioe core stall i<: T T),» Kt« i, u » • ™. decoder core by gating the clock to the core when 

h^F jreO b o L^'t^^^^^^^^^^ '\ "^^"^'""^ - -t of double buffered 

The half-block buffer interfece therefore consists of 2 single JPEG half-hlocW hnffl-rc , 
combinatorial logic, as shown in Figure 102. half-block buffers and some simple 



P«_out_vaIW 
jpgLCore^stall ^ 

Idk^enabfe ^ 



pixeLdata 




Figure 102. Block diagram of half-block buffer interface 



Doc: SoPEC^hardware^design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 280 




2ZS.7.1 Half-block buffer select unit 

this case, each buffer is a half JPEG block. i.e. 32 b^ S ^ 4 ^^^. " ^ 

*|#-W/w..6,^7.Whe„;^^ca4?^ yp^.o«.««// equals 

the production of pixels. n,e clock gating is v^on^SS^r^J^^^'f^"^ " ^^'""^ off so as to stop 
output from the CDU. When jj eZVleT^^^^i j^!^^^^^ 

(/clk_enable is the inverse ofjpJ^coi^^Zl ^ " Jdk.enable is 0. yc/Jt is 0 

^LLlr^0j^^':tZf^^^^^ «..^^.is i^^ened. 

ANDed with /rf.arfu. of JPB-Core_stall. The output equals half_block_ok.to\ead 

22.S.7.2 Contone plane buffer 

Each^ontone plane buffer consists of two half JPEG block buffers as shown in block diagx^ fo™ in Fig- 



fd_bufl 




^ odu_dhj_data{63:0] 



pbcel.data 



Figure 103. Contone plane buffer Interface 

lected at the first shift tg^r n gtS^q^u^S^^^^ '^T " ' " 

isterin«bit,uantHies.Data.is«adl?o.thesrnTsrft^;i^^ 
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22.5.8 Write control unit 

A line of JPEG blocks in 4 colors, or 8 lines of decompressed conlone data, is stored in DRAM with the 
memory arrangement as shown Figure 104. The arrangement is in order to optimize ^e^,^t 
wntmg the data so that 4 color components are stored together in each 256.bit DRAM vSS^ ^ 



4 tine 
store 



ORAM wordp 
DRAM word fH4 



DRAM 



4 line 
store 



DRAM word p-*4n 

DRAM word q 
DRAM word 



DRAM word q+4n 



JPEGb(ockO 
fines 0 to 3 



JPEG block 1 
iines 0 to 3 



JPEGWockn 
lines 0 to 3 



JPEG block 0 
fines 4 to 7 



JPEG block 1 
fines 4 to 7 



JPEGbtockn 
Unes 4 to 7 





IjO I C 


Lo t a 


10 


C^LI i C; 


11 1 c 


LI 1 a 


IL1 


ciL2 1 c; 
— ▼ — \ — 1 


:L2 I C 


L2 1 C( 


>L2 


C3L3 1 C2 


^ — 1 — 1 

IL3 1 CI 


F h— ^ 

L3 1 CC 


)L3 



word p 
wordp^l 
wordp4-2 
word p43 



255 191 




127 


63 


0 


C^4 1 c: 


IL4 


» C 


L4 1 C(j 


IL4 


C^L5 1 c: 


:L5 


^ 

\ C 


L5 1 C( 


IL5 


cjLd 1 c; 


16 


1 c 


t6 1 C( 


IL6 


C3L7 i C2L7 


1 — ^ 

1 CI 


* — I— ^ 
L7 r CC 


» — 

)L7 



word q 
word q+1 
word q-»-2 
word q-f 3 



^ Implies 4 X 64 bit writes to consecutive 

words in one DRAM row, for a sinole 
CDU access to DRAM 

OX - Color X 

LY - Line Y or 6 bytes of a Kne in a JPEG bJock 



Figure 104, DRAM storage arrangement for a single line of JPEG 8x8 blocks in 4 



colors 



STa?? Ti, ! ^ P""^^^^ ^ ^^^^^ ^"^^i s^^<^nd 4 lines separately 

^SiJ^ ilf ' I ^ '[^^^^ ^^^^^ ^^^^^ ^ 4 ^oJ^^. ^ shown in Figure 104. is 

follows below and corresponds to the order in which pixels are output from the JPEG decoder core: 

block 0, color 0, line 0 in word p bits 63-0, line 1 in word p^l bits 63-0 

line 2 m word p-.2 bits 63-0, line 3 in word pO bits 63-0, 

block 0, color 0, line 4 in word q bits 63-0, line 5 in word q.l bits 63-0, 

lirie 6 in word q+2 bits 63-0, line 7 in word q^3 bits 63-0, 

block 0, color 1, line 0 in word p bits 127-64, line 1 in word p.l bits 127-64 

line 2 in word p>2 bits 127-64, line 3 in word bits 127-64, 

block 0, color 1, line 4 in word q bits 127-64. line 5 in word q.l bits 127-64, 

line 6 xn word q+2 bits 127-64, line 7 in word q+3 bits 127-64, 

repeat for block 0 color 2, block 0 color 3 

block 1, color 0, line 0 in word p*4 bits 63-0, line 1 in word p*5 bits 63-0, 



Ul 

as 



block N, color 3, line 4 in word q.4n bits 255^192, line 5 in word q.4n.l bit^ 255-192 

line 6 in word q*4n^2 bits 255-192. line 7 in word q*4nv3 bit 255-192 
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ft. (.dividual bi. write iivuS .f WetlSdn:^!''"""' L" "= 

only 64 bio o»t of a. 256*. a<^s J^SijM^^l'ifZ''^ *«> Son, the CDu! 

by Ibe DIU. ms OKUK to 0,, decoimriirJ^J^ J '°»™«« WB of ita write a„ ™sk^ 

block to DRAM. Once the half-block bufcTm^^ ^5" ^f^^^^^^^ 

requests a write access to DRAM by assertk^ cSXl^ff? * ^ ^t^^ machine 

mg to the first 64-bit value to be J^UeTTcdu jfu ZZt'r ^ T^'^'J"^ ^^"^ '^oirespond- 
access of 4x64 bits is issued by the CDU Tie nm ^ "'^ '^'^ 64-bit value in each 

fourth 64-bit values). The st.t/rn^^t^:^Z'':^::TLt' 1^ ^ 

mg a read of 4 64-bit values from the half-block buffer!^?!!^ acknowledge from the DIU before initiat- 
put cdu_diu_^alid is asserted in the cycle after rS^l t h ^ ""T""^ ^'^^ "^y^l"- out- 

the cdu_diu_daca bus and should ^^n^ntV^;^tlir£^ '° '^^^ ^''^ ''^ P'^^^nt on 

•s then sent to the half-block buffer interfa e t inTca^ th« ^^^^^ 
^ould now beavailabletobe written to again. ^^^^^^^^^ 

// assign wrice address output to DRAM 
Cdu_<Siu_wadr(6:S) =00 /, 

// corresponds to linenumber, only first addr.=. . 
// xssued for each dram access Thu, 

// TKi* nvt* «»«-cBss . Trtus line IS alwavs o 

cd«_diu_wadrt4:3] = color °™ saneraees these bits of the address 

if (half == 1) then 

^^^cdu_diu_wadrc.l =7, = upr_hain>locK.adr „ ,,„.3 of .phc hXoc. 

Cdu.diu..adr,.l., , l.r_halfhloc._edr ,,„,3 ,.3 ^^^^^ 

// update half, color, block and addresses 

If (rd«€idv_hal£^block i) then "^""^^^ 
if (half a= 1) then 
half « 0 

if (color == max^plane) then 
color = 0 

if (block =a .max block) t^hon 

puise wradvSlIne ^ of writing a ii„e of JPEG blocks 

block = 0 

--"^-^^l^^^e^'^-rnfrn"^^ V -^^ng 
if .upr_halfhlock_ad: H'^^^^^^S Tj^r hV""*' ' ''"^ """^ 

ex3ir7^p^^^inb\:c^ : -«-3"":.ar'/i*L,iock . i 

upr.halFb^ocK-a:^:^buff'^:;^'^"^^ ' " ».uff_e„a_.dr, then 
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else = upr-halfblocK.«dr ♦ m«ebloclc ♦ 2 

block 

else - // move to address for U„«, for next bloc. 

color 

else 

half = 1 

if (color == maxjJlane) then 

if (block ««x.blocJc) then // end of writing . line of JPEG blocks 

if (lwr_h«lfblock_adr -= buf f_end_adr) then ozzset 

Iwr halfblock_adr « buf f_8tart_adr ♦ maxjblock + 1 
elaif (lwr_halfblock_adr ♦ iBax_block + 1 =- buff end adrl 

lwr_halfblock_«dr - buff.etart.adr *>««-8'i<J-«<lr) then 

else 

lwr_halfblock_adr = lwr_halfblock_adr + inax_block + 2 



else 

lwr_halfblock_adr +♦ 



// move to address for lines 0-3 for next block 
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cdu„<flu_wreq « 0 
odu_dtu_wvaDd » 0 

r<*-.adv_halLblock a o 



cdU.dhi.wvatld>o 
rd.adv « o 

«i-a<l»f.halLblock«o 
( reset ) 



< 



idle 



c 



3 



Cdu_<Jfu^wreq o o 
rd.adv b o 



req 



c 



> 



half block nk tn roWn j 
cdu,dlti_wreq « 1 

rd_adv = 0 

rt^adv_haif_bteck » 0 



ack 



c 



diu cd.. u^ch — 1 
cou_diu_wreq s o 
cdu_<Jiu_wvalid a 0 
rd_adv a i 

rd.adv_haiLblock a 0 



read 



c 



3 



cdu_dlu wreq » 0 
oftj_diu_wvaud = t 
rd.adv « i 

»«l-acfv_halLblock » o 



write 1 



c 



odu_diu_wfeq 0 
cdu.diu.wvaHd » i 
fd.adv s 1 
rd^adv_half_btock » 



write2 



c 



3 



cdu_dlu_wreq = 
cdu_d<u_wvaUd 
rd_adv a T 
nJ_adv_harf Wock 



1 



write3 



c 



3 



cdu_dfu_wfeq = 0 
cdu_diu_wvaiid « 1 
rd.adv s 0 
'd^adv^halLWocIt » 



write4 



> 



cdu_diu_wreq a o 
odu_diu_wvalid » 0 
rd^adv = 0 

r«*_adv_liaW_Wock » 0 



Figure 105. State machine to write decompressed 



contone data 
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22.5.9 Contone line store Interftice 



J^AM S inr, ! '"l^"^ >f responsible for. providing the control over the shared resource in 
DRAM The CDU wntes 8 lines of data in up to 4 color planes, and the CFU reads them Ime-at-a-time 
Tie oontpne hne store interfkce provides the mechanism for keeping track of the number of lines stored in 
DRAM, and provides signals so thatagiven line cannot be read from untU the complete line has been wri^^ 



A u rt"^^ "^'^ ^ ^ second 4 lines to separate areas in 

DRAM. Thus, when the CFU has read 4 lines from DRAM that area now becomes free for the CdS to 
wnte to. Thus die size of the hne store in DRAM should be a multiple of 4 lines. The minimum size of the 
Une store interface is 8 Imes. providing a single buffer scheme. Typical sizes are 12 lines for a 1 5 buffer 
scheme while 16 lines provides a double-buffer scheme. 

V'ti^ iJ^.T'^'°'^ ^ ^^'"^ '»^-l^jrJi"es. A count is kept of the number of lines 

stored in DRAM that are amiable to be written to. When Go tnmsitions from 0 to 1. num lines avail is 
set to the value o f num_buffjines. The CDU may only begin to write to DRAM as long as"there"is space 
available for 8 Imes. indicated when die line_store_ok_to_wriie bit is set. When the CDU has fini^ed 
wntmg 8 Imes. the write control unit sends an .vra^5//„e pulse to the contone line store interfoce and the 
CFU, and «um_/M«_flVfl,7 is decremented by 8. The write control unit then waits for 
lme store_ok_to write to be set again. The CFU is responsible for responding to wradvSline pulses appro- 
pnately. and sends its own rdadvlirte signal to the CDU's contone line store interface to free up each line as 
It fimshes readmg them. numjines_avail is incremented by 1 on receiving a rdadvline pulse 
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23 Contone FIFO Unit (CFU) 



23.1 Overview 



23.2 Bandwidth requirements 



^^X'^^UCr'''''' ^"^ "'^^^ *e -e at Which the contone data 

S(?l;Tr^?:^^rer^^^^^^ 

direction is perfonned at the ou^,ut of ,he c^^^^^^^ V^^O dpi. Replication in the X 

tion is performed by the CFU rSdi^ «^h H„e a nlS^J^ oS^ '""'^ Z""'" ^« ^ 

DRAM. The HCU generates I dot fbi-I^I in . J^^ ^'=°^*'i«S '<> 'he Y-scale factor, from 

I side per 2 seconds^or STb eeS A4^;i piti^^^^^^^ -^-'^ ^ P^nt sp;ed of 

color contone pixel (32 bits) even' SF^des tl^ni^T ''"^ ''^ ''"PP''^^ ^"h a 4 

from DRAM at 5.33 bIts/cycleT '^'^^ ^' PP' must read data 

23.3 Color space conversion 

b?ea?h^r;LLti^rs^^^^^^ 

and K, directly represent^ by ClS^K ST ?i « f^^ '^"^ """"P'"' ^^"^ "'^^^ ""^y be C. M. Y. 
muIti-SoPEC prinSng wiA exact coi^!^. '^^^ "^^^ ^"P^^^^"' 8°'^. n^etaliic green etc. 

i^^T^V^tlZrCM^^^ visible quality when luminance and chrommance 

luminance infoSSandl i^^„^e/^t Z " '>V"^^^' M and Y each contain 
fore provide the means by wHch°Si^^ir;3^^^^ 

sion. w oorni^ as YCiCb. IC does not need color conver- 

toCMY. *'*"*^"°"'^''«<^>*en color converted to RGB. and finally back 

The external RIP provides conversion from RGB to YCrTh e~.^.« « . 

implementation of the inverse transfonr, wSu^I^pec STr 6^1 VonT'' '"^""^ 

are nonnalized to occupy all 256 levels of an 8-bit btaSy^ncocJng ^ ^ ^' 

The CFU provides the translation to either RGB or CMV ursR Sc j ^ • 



1. 32 bits/ 6 cycles = 5.33 bits/cyc!c 
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Consequently the JPEG stream in the color space convenor is one of: 

• 1 color plane, no color space conversion 

• 2 color planes» no color space conversion 

• 3 color planes, no color space conversion 

• 3 color planes YCrCb, conversion to RGB 

• 4 color planes, no color space conversion 

• 4 color planes YCrCbX. conversion of YCiCb to RGB, no color conversion of X 

^Z?"^ conversion is described in [14]. Note that if the data is non-compressed, there is no 

specific advantage in perfonmng color conversion (although the CDU and CFU do permit it) 



23.4 Color space inversion 



? nft ! ? P^onmng optional color conversion the CFU also provides for optional bit-wise inversion 
m up to 4 color planes. This provides the means by which the conversion to Cl^ may be or^n 
may be used to provide planar correlation of the dither matrices. ^ ^ 

The RGB to CMY conversion is given by the relationshio* 

• C = 255-R ^' 

• M = 255-G 

• Y«=255-B 

These relationships require the page RIP to calculate the RGB from CMY as follows: 

• R — 255 - C 

• G = 255-M 

• B=255.Y 



23.5 Scaling 



?e S^^^^^^^^^^ ^'"^1 ^ "'^^^^ ^^^^^^^ the output to 

SntS^iL ^ printer resolution. The CFU supports non-integer scaling with the scale factor repre- 

sented by a numerator and a denominator. Only scaling up of the pixel data is allowed, i.e. the m^eSor 
should be greater than or equal to the denominator. For example, to scale up by a facto; of two ^Ta Sf 
the numerator is programmed as 5 and the denominator programmed as 2. ^ or oi two ana a half. 

Scaling is implemented using a counter as described in the pseudocode below. An advance pulse is ecner^ 
ated to move to the next dot (x-scaling) or line (y-scaling). ^ ^ 

if (count + denominator - numerator >= 0) then 

count = count + denominator - numerator 

advemce = 1 
else 

count = count -»• denominator 
advemce = 0 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 288 



SoPEC : Hardware Design 



23.6 



Leao-in and lead-out cupping 

Mock . bej™,) ^, U» bl„n r JS.XSp^^^^ 

hne pnnted by SoPEC #2. Pixels in this rPFr, ki^^u ♦ y • ^orcu » i and the first JPEG block in the 
ately setting ^ e WOu/C/^^ ptSu^^^^is J^^^^^^^ ^ '^^^^'^ ^PP^P- 

at the beginning of each line The numl^r o^n^rfftf? f ^"'^ must be ignored 

LeadmClipNum register ""''^ °f ^^^h is specified by the 

It may also be the case that the CDU writes out more ippo Kt^ i, .i, 

as shown for SoPEC #2 below. In tWsTie Ae X S i « , f " ■^'^''''^ '° ''^ CFU. 

spond to JPEG block . but the va^Te foMhf ^SjJ.S t^^^^^^ ^''^ »° 

block m-J. Thus JPEG block n, is not read in by^cSj CPU is set to correspond to JPEG 



SoPEC #1 
tead-ln area 



,SoPEC »2 SoPEC #1 
lead-in area lead-out area 



SoPEC #2 
lead-out area 




SoPEC #1 prints left 
srde of page 



SoPEC #2 prints right 
side of page 



Figure 106. Lead-in and ,ead-out clipping of contone data in multi-SoPEC environment 

ilSSt^?ff>:rid %r^^^^^^^^^^ t^^y - -^^^ "P - .he printer. resoluHon. The 

reg.ster defines TheTzeofte Z p"«e t'L clT^ ' ^r"""""' 
trols the scaling of the last valid pixe! i^Snao t H^S 
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23J Implementation 

Figure 107 shows a block diagram of the CFU. 



ORAM rnterface Unit 



•D. 




decompressed 
contone buffer 



4- 




wr_er». rd_en 




/- 


_sel(l;OJ. fd_seJI2:0J 



1!1 



Y-scalmg 
controf unit 



K Cb Cr 

cotor space converter 
cp3 cp2 cp1 cpQ 



'8 



8 



8 



YCfCb2RGB 



^ tnven_colof jiane 



8 y^A^y 



15/ 



15 



configuration 
registers 



8 



output 
double-buffer 



2^ wr_btrff. rd.buff 





16/ 








£ 


Jeng 


E 


o 
c 


C 


<s 




ca(e_ 


§ 


i= 






, 1 


L-..1 





^ 2^ wr_en, rd^en 



t t 



I t 



X-scalIng 
control unit 



Contone 
FIFO Unit 



Halftone/Compositor Unit 



PEP Controller Unit 



Figure 107. Block diagram of CFU 
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23.7 A Definitions of I/O 



Table 101. CPU port list and description 




pcu_rwn 



cfu^pcu_dataI31:0) 



32 



DIU Interface 



In 



Out 



Block select from the PCU. When pcu^cfu^selis high both 
pcu^adr and paj_dataout are valid. 



Common read/not-write signal from the PCU. 




m!^^.^f^ ^^'^ Cfu-Pcu.rzfy is h\gh it Indicates 

paccteteoi/f has been registered by the Wock and for a read 
cyde this means the data on ctu^ycu^data Is valid. 



Read data bus to the PCU. 



cfu_diu_rreq 



cfu^dlu.radr(21:5] 



diu_cfu_rvalld 



diu_data(63:0] 



CDU Interface 



17 



Out 



Out 



CPU read request, active high. A read request must be accom-" 
panred by a vahd read address. accom 



Acknowledge from DIU. active high. Indicates that a read 

request has been accepted and the new read address can be 
placed on the address bus. cfu diu_radr 



64 



In 



In 



CFU read address. 17 bits wide (256>bit aligned word). 



Read data valid, active high. Indicates that valid read data is 
now on the read data bus. diu^data. 



Read data from ORAM. 



cdu_cfu_wradv8Dne 



cfu_cdu_rdadvflne 



HCU Interface 



In 



Out 



Write 8Ime pulse, active high. Indicates that the CDU has fin- 
ished writing to 8 fines of decompressed contone data to the dr- 
Oi^ar buffer in DRAM and the data is available to be read by the 



Read line pulse, active high. Indicates that the CFU has finish^ 
reading a line of decompressed contone data to the drcular 
buffer m DRAM and that Qne of the bufter is now free 



hcu_cfu_advdot 



.cfu_hcu^avail 



cfu_hcu_c0dataf7:0] 



ofujTaj_c1 dataf/.-oj 



In 



Out 



Out 



Out 



Informs the CFU that the HCU has captured the pixel data on 
cfo.hcu^c(0-3]data lines and the CFU can now place the next 
pixel on the data lines. euienexi 



Indicates valid data present on cfu_hcu,c(0'3}data nnes. 



Pixel of data In contone plane 0. 



^ixel of data In contone plane 1. 
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23.7.2 Configuration registers 

The configuration registers in the CFU are programmed via the PCU interface. Refer to section 21 .8.2 on 
page 257 for the description of the protocol and timing diagrams for reading and writing registers in the 
CFU, Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register reads 
and writes, the lower 2 bits of the PCU address bus are not required to decode the address space for the 
CPU. When reading a register that is less than 32 bits wide zeros should be returned on the upper unused 
bit(s) of cfii^cu^data. The configuration registers of the CFU are listed in Table 1 02: 

Table 102. CFU registers 




Control registers 



0x00 



0x04 



Reset 



Go 



0x1 



0x0 



Setup registers 



A write to this register causes a reset of the CFU. 



Writing 1 to this register starts the CFU. Writing 0 to this 
register halts the CFU. 

When Go is deasserted the state-machines go to their 
idle states but all counters and configuration registers 
keep their values. 

When Go is asserted afi counters are reset, but configu- 
ration registers keep their values (I.e. they don't get 
reset). 

The CFU must be started before the CDU is started. 
This register can be read to determine if the CFU is run- 
ning 

(1 ■ running. 0 - stopped). 



0x10 



0x14 



0x18 



0x1 C 



0x20 



MaxBfock 



BuffStartAdr 



BuffEndAdr 



4UneOffset 



YCrCb2RGB 



13 



15 



15 



13 



0x000 



0x0000 



0x0000 



0x0000 



0x0 



Number of JPEG MCUs (or JPEG block equivalents, i.e. 
8x8 bytes) in a line - 1 , 



Points to the start of the decompressed contone circular 
buffer in DRAM, aligned to a half JPEG block boundary, 
A half JPEG block consists of 4 words of 256>bits. 
enough to hold 32 contone pbcels in 4 colors, i.e. half a 
JPEG block. 



Points to the end of the decompressed contone circular 
buffer in ORAM, aligned to a half JPEG block boundary 
(address is inclusive). 

A half JPEG block consists of 4 words of 256-bits. 
enough to hold 32 contone pbcels in 4 colors, i.e. half a 
JPEG bk>ck. 



Defines the offset between the start of one 4 line store to 
the start of the next 4 line store. In Figure 1 08 on 
page 294. If 8t/«fartAdr corresponds to line 0 block 0 
then BuffStartAdr + 4/./neOffisef corresponds to line 4 
block 0. 

This register Is required in addition to ilfaxS/odicas the 
number of JPEG blocks in a line required by the CFU 
may be different from the number of JPEG blocks In a 
line written by the CDU. 



Set this bit to enable conversion from YCrCb to RGB. 
Should not be changed between bands. 
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Table 102. CPU registers 




0x28 



0x2C 



0x30 



0x34 



0x38 
0x3C 



0x40 



0x44 



23.7.3 



HcuUneLength 



LeadlnCtipNum 



LeadCXftClipNum 



XstartCount 



XscaleDenom 



YscaleNum 



YscaleDenom 



16 



0x0000 



0x0 



0x0 



0x00 



0x01 
0x01 



0x01 



0x01 



bitO - 1 invert color plane 0 

-0 do not convert 
bitl - 1 1nvert cofor plane 1 

' 0 do not convert 
blt2 - 1 1nvert color plane 2 

• 0 do not convert 
bits - 1 invert color plane 3 
Should not be changed between bands. 



Number of contone pixels - lln a fine (after scaling) 
Equals the number of hcu^cfu^dotadv pulses - 1 
received from the HCU for each line of contone data 



Number of contone pixels to be ignored at the start of a 
.ne (from JPEG block 0 in a line). They are not passed to 
the output buffer to be scaled in the X direction 



Number of contone pixels to be Ignored at the end of a 
line (from JPEG block MaxBiock In a Une). They are not 
passed to the output buffer to be scaled in the X direc- 
tton. 



Value to be loaded at the start of every line into the coun- 
ter used for scaling in the X directton. Used to control the 
scaling of the first pixel in a iine to be sent to the HCU 
This value will typically be zero, except in the case where 
a number of dots are dipped on the lead in to a line. 



Numerator of contone scale factor in X direction. 



Denominator of contone scale factor In X direction. 



Numerator of contone scale factor in Y direction. 



Denominator of contone scale (actor in Y direction. 



Storage of decompressed contone data in DRAM 

The CFU reads decompressed contone data from DRAM in single 256-bit accesses JPFfi hlo.Vc 

Sr~-Tf— ^^^^^^ 

m each 256-bit DRAM word. The means that the CFU reads 64-hits in 4 minrc . • i i '^^^^"^.^ 
256-bit DRAM access. ^"^"^ ^ ^'"S^^ each 
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4 line 
store 



ORAM wordp 



ORAM word fM4 



DRAM word p44n 
« 

DRAM wordq 
ORAM word 9+4 



DRAM 

JPEG block 0 
lines 0 to 3 



JPEG Wock 1 
Unes 0 to 3 



JPEG bfock n 
Hnes 0 to 3 



JPEG Wock 0 
Ones 4 to? 



4 line 
store 



ORAM word q+4n 



JPEG btock t 
iines4to7 



JPEGUodcn 
IJnes4to7 



C3^0 


IV 1 

1 C2L0 1 


C1L0 1 


»3 Q 
COLO 1 


C3^1 


\ -4 

1 C2L1 1 


C1L1 1 


C0L1 1 


C3^ 


1 C2L2 1 


C1L2 1 


C0L2 


0:^13. 


1 C2L3 1 


caa 1 





255 


191 


127 


83 0 




C3^4 


1 C2U 


1 C1L4 


1 C0L4 




C3^5 


• C2L5 


1 C1L5 


1 COLS 




C3^ 


V 

1 C2L6 


1 C1L6 


I C0U6 




C3|^ 


1 C2t7 


1 C1L7 


» C0L7 



word q 
word q>l 
wordq+2 
word q^3 



Implies one 258 bit read of a word in DRAM 



CX - Color X 

LY - Une Y or 8 bytes of a line in a JPEG block 



Figure 108. DRAM storage arrangement for a single line of JPEG blocks in 4 colors 

^^^^ ^ ^""^ ^^""^ ^""^ "^^^ ^""^^ sequence, as shown in Figure 108. is 

line 0, block 0 in word p of DRAM 
line 0, block 1 in word p-t.4 of DRAM 



line 0, block n in word p+4n of DRAM 

(repeat t:o read line a nuinber of times according to scale factor) 

line 1, block 0 in word p+1 of DRAM 
line 1, block 1 in word p+5 of DRAM 

etc 

The CFU reads a complete line in up to 4 colors a Y scale factor number of times from DRAM before it 
moves on to read Uie next. When the CFU has finished reading 4 lines of contone data that 4 line store 
becomes available for the CDU to write to. 

23.7.4 Decompressed contone buffer 

Since the CFU rcaj 256 bits (4 colors x 64 bits) fix)m memory at a time, it requires storage of 2 x 256 bits 

Vl^r \ 'T'll 't' ^"""^ ^ ^y^'" of a single color per 

cycle). It .s implemented as 4 buffers. Each buffer conceptually is a 64-bit input and 8-bit outout buffer to 
account for the 64-b.t data tmnsfers from the DIU. and the 8-bit output per <S,lor plane to the color space 
converter. In reality, each buffer is actually implemented as a double-buffer of 2 x 64-bits vwde. 
On the DRAM side H-r.fe^indicates the current buffer widiin each double-buffer that writes are to occur 
to. selects which double-buffer to write the 64 bits of data to when Mr_en is asserted. 
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13 



23.7.5 Y-scalmg control unft 



The Y-scaling control unit is responsible for reading Ai* Ha^/x,*,« j 

color space converter via the decVmpn^S co^^fS^l'^S^T' '^'^ '^'^ P"^'"^ to the 

DRAM in single 256.bit accesses, reviving SiTS STS, Jf,n r T'^^f "^"^ " 

The protocol and timing for read acc^«« to DR^Tnt ^ ' "y"'" ^^^'"^ ^ «=y<=>e)- 

accessestoDRAMSp,e.ente?Srj^^,rj^^^ 

s~rt;^frLrs.^ss^^^^^^^^^^ 

buff_ok_toj^rUe flags to tell it wSeTto attemo^^^^^ line8_ok_to_read and 

When ro read is 0 ^H^e n^S ? 1"*°^'^''"^'^"'*"°"^^^^ 

machine continues7oi;»H H J, n^r^r.^ri""^."^^^ nothing. When line8.ok_:o_read is 1 the state 
space available in the buffer. decompressed contone buffer up to 256-bits at a time while there is 

;^Si)tr\trrruS^ 

that writes are to occur to. ^^^o occur from, and a smgle bit (wr_buff) for the current buffer 

of d»a ft.™ DRAM .o ft. !»,«•» b^S^tS " *' "->■'>' 

»ri« tte d.« ,0 .he outpo, doublcbEof ^uZJi^t °A "f^ 'f '"'"^"^ "> 
bU . 1^ a^en^d. 6«ir_..o/V^J„%t "LT^j^^'S^^ °" '»'*'• 

o^ 'ZlX'<'^,L'T^:Z^^^t£T/-'-'l ^™ ™- 

•lireclion is Ihos perfomed. «'»'»P'«»sM "moo. data. Scaling to Ihs priothead resolution in the Y 

.inosr„„oiaM.^...S;r?r«nUt^.^~-£;~^ 

// assign read address output to DRAM 
cdu_diu_wadrt21:7) = curr^ha If block 
cdu^diu_wadr[6:5J = line[l:0? 

rn^JSf e -«^— -^-r each OR;^ .,.a access 

" b"cT-"o »^ "-<'^"« - ^ine Of co„.o„e in . coXc. 

// check whether to advance to h«^f^ Hr^.. 

if (y_scale count * y scale dL^! ? contone data in DRAM 

V * y-.scale_denoni - y__scale_num >= 0) then 

P^L'f r5^iL: — * y.scale.deno. - y^s^ar^.nu. 
if (Xine 3> .Hen . ^nd of .eadin. . ii„e sto.e of contone data 
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line * 0 

'n T^l^^^ block address for start of next line taking account of 
// address wrapping in circular buffer and 4 line offset 
if (curr^lfblock == buf f^end^adr) then 

curr_halfblock = buf f_start_adr 

line_start_adr * buf f_start_adr 
elsif (<lin^start_adr . 41ine.of fset) buf f.snd_adr) ) then 

currjialfblock « buf f_start_adr • u_ wti , 

line_start_adr = buff_start_adr 
else " 

curr^halfblock > line_start^adr + 41ine offset 
line^start^adr =. line.start^adr + 41ine.offset 

else 

line •*■•«- 

curr_halfblock = line.start^adr 

else 

// re-read current line from ORAM 
y^scale.count = y^scale.count ^ y^scale denom 
cur r_half block = line,start adr 
else " 
block 

curr^half block 
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Go Ban 

cfu.cliu_r7aq b o 

wr.sel ti 0 
wiL,adv_buffoO 



Cfu.dit4_rreq a o 
wr_adv_buff a 0 



< 



idle 



c 



3 



cfu_dlu_rreq a o 
wr^acJv buff a 0 



req 



c 



> 



guff oK tft wrrtft 
cfu_aiu_rTeq « 1 

wr_ady_buffaO 



ack 



c 



cfu_diu.rreq » 
wr_sel a 0 
w^adv.buff a 0 



readl 



c 



3 



cfu_diu rreq = o 
Wf_se? B 0 
wr_adv_buff = 0 



read2 



c 



3 



dki cfti lynfiT^ ' 



cfu_dru_n'oq » 
w_6ei a t 
wf,adv_buff = 



reads 



) 





Xfiu Cfu ivaHd «»i 




cru_diu rreq a 0 




wr^sel a 2 




wr.adv^buff a 0 


1 


r 



read4 ^ 



CTU_diu_rreq a 0 
wr_sel a 3 
wr_adv_buff a i 



Figure 109. State machme to r^ad decompressed contone data fi.,m DRAM 
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23.7.6 Contone line store interface 



J^J'TT T,"^^^ responsible for providing the control over the shared resource in 
DRAM. Tiie CDU wntes 8 lines of data in up to 4 color planes, and the CFU reads them line-at-a-time 
rJf**;?'""! P""^***^ mechanism for Iceeping track of the number of lines stonai in 

DRAM, and provides signals so that a given line cannot be read from until the complete line has been writ- 
ten* 

A count is Icept of the number of lines that have been written to DRAM by the CDU and are avaUable to 
rfnTw ^* «*^-"P' ^ffJines-avail is set to the 0. The CFU may only begin to read from 

DRAM when the CDU has written 8 complete lines of contone data. When the CDU has finished writing 8 
knes It sends an cdu_cfu_wradvSline pulse to the CFU, and buffjines_avail is incremented by 8 -Hie 
CFU may continue reading from DRAM as long as buffjines_avail is greater than 0. line8_ok to read is 
set while i*#_«««_ava// is greater than 0. When it has completely finished reading a line of contone data 
from DIL\M, the Y-scalmg control unit sends a RdAdvLine signal to contone line store interface and to the 
CDU to^free up the Ime in the buffer in DRAM. buffjines_avail is decremented by 1 on receiving a RdAd- 



23.7.7 Color Space Converter (CSC) 



The color space converter consists of 2 stages: optional color conversion from YCiCb to RGB followed bv 
optional bit<wise inversion in up to 4 color planes. 

v™^^^^ 1° ^'^"^^ ^ '""^^ « Cr. and Cb and outputs either the same 

RGB. If YCrCb2RGB equals 0. the conversion does not take place, and the input pixels are passed to the 
second stoge. The 4th color plane, if present, bypasses the convert YCrCb to RGB block Note that the 
latency of tiie convert YCiCb to RGB block is 1 cycle. This latency should be equalized for the 4th color 
plane as it bypasses the block. 

The second stage involves optional bit-wise inversion on a per color plane basis under the control of 
wvm co/or^/on*. For example if the input is YCrCbK, then YCrCb2RGB can be set to 1 to convert 
YCrCb to RGB. and invert_colorj>lane can be set to Oil 1 to then convert the RGB to CMY. leaving K 

unchanged. * 

If yCrCW/?<75 equals 0 and invert^color_j>lane equals 0000. no color conversion or color inversion will 
take place, so the output pixels will be the same as the input pixels. 
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Figure nOshowsablock diagram of the colorspace converter. 
[ converter/Tnvefter 




^-O^) ..WW 



8 
-7^ 



izi — 



->cpO 



►cp2 



YCrCb2RGB 



'Overt^cotof.plane 



23.7.8 



Figure 110. Block diagram of color space converter 

version is implemented as follows- ^' ''"^ ^"""'^ " maintained with 18 bits. The con- 

• R* = Y + (359/256)(Cr-128) 

• G« - Y - (1 83/256XCr- 128) - (88/256)(Cb- 1 28) 

• B* = Y + (454/256XCb-I28) 

X-scaling control unit 

that writes are to occur to. ° bit (u'r.du^ for the current buffer 

unii. I.e. When wr^adx^ ,s 1, ftxcls m the lead-in and lead-out areas are 



K -179 is saturated to 0 

2. 135.5. with rounding becomes 136. 

3. -227 is saturated to 0 
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if (wradv i) then 

if (pixel_count {<nax_block,blll>) then 

pixel^count = 0 
else 

plxel_count ++ 
if ( (pixel_count < leadin_clip_num) 

wr.en f , ^^^^^^-^^^^^ > ««ax.bloc)c,blll) - leadout.clip^nuzn) „ then 
else 

wr_en = i 

When a pulse is sent to the output double-buffer. buff_availfwr_l>uffj is set. and ht buff is inverted 

l^rnTf\f~'"^-TJ! b^ff-^^^inrd.bujff]. When cfii_hcu_avail equals 1 this indicates to the 

HCU that data is available to be read from the CFU The hpii ~c.,r„^ w ^ inaicaies to the 

first pixel is scaled by. hcu line len^h and h^.^ T!^ 7^ ^^'""^ *e 

line that is sent to the HoJ i sciX ^^^"^^ ^o""* by which the last pixel in a 

if <hcu_cfu_dotadv == 1) then 

" v''r^!^^^-''°""^ * x_sc«le_denom - o^_scalo_„un, >- 0> then 

x^cale_count = ^scal«_count * :esc.le_denon, - K,s=alo_„um 



else 



X scale.count - x^scale^count * x^scale^denom 
ra_en » 0 

else 

x_scale_count = x_scale_count 
rd_en = 0 



When a td^en pulse is received. buff_avail[rd_bugj is cleared, and is inverted. 

ftol^ftlTl7\f/i:'^7'''^"^*i''^^'°''*"P"*=°'"^ cfi^ rfWv pulses received 
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24 Lossless Bi-level Decoder (LBD) 



24.1 Overview 



thc number of lines .0 decon.press.''^r:£Xl,JSe* iTs^PE^ 

compression, the LBD can cope with any comDress^r«H«Vf Ihl 1 '° P""* ^^"^ ' 

pass-through mode is provide? for "l^ZSS Te^in. 1 ^""^ ^ 

50:1. Lossless bi-level compression L^STa^JnC f Z TlT''^^' * '^io o^^o^t 
which compress poorly. P"*^ " "^^^^ 20:1 with 10:1 possible for pages 

o'J;ror ^F^^'spJ^V;^^^^^^ ^,<' <^-'nP-ed bi-level data is 

unit) for the next stage in the priS^^SeUneT^ r rh 1^ '° "^"^ (Halftoner/Compositor 

is used by the PCU ^d is U^Tl'^tn^\ol^^°'''''''' ' Ibdjinuheiband control 4 that 









DRAM 1 
Intertaoe Unit J 


fbd^finishedband 










PCU 


4 


LBD 




Spot FIFO 
Unit 







HCU 



Figure 111. High level block diagram of LBD in context 



24.2 Main features of LBD 
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Figure 1 12 shows a schematic outline of the LBD and SFU. 

at 1600 dpi ^ ^ ''"^ ""^^"^ """^ be long enough to store a complete line 

PECl LDB outputs 1 6 bil in paxalll to Ae PEC?^,'" ^5 W 7 ^"^^ ' '^^^'^y^'^" '^'^ 

the LBD in SoPEC can nm much faste th^ is rSiSS. ^fi^T?"*". f ° n "^""^ 

processing latency, to be absorbed. « required. This is useful for allowmg stalls, e.g. due to band 

grammed number of bits JS<Si^^ i^k^r ^-Jh '° "'''^"^ ^^'^ °'" ""-^ ^ P«-Pn>- 

icngth code, followed by plS^Zligh ^""^ ^ ^ ™- 

s^LT<^?:;p.S3«^^^^^ - Spot FIFO Unit (SFU). -mis 

lines up to a proLnmabTe nl?Ii^,'ir:f " 2 lines stored in DRAM, nominally 3 

write access to' D?^herefrX LbD mul^ b^^^^^^^ --'^ 
. ^ "'"SI oe apie to support stalling at its output during a line. 
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'^in^Z ^'^^"^ " P™^- is provided by the SFU via it's PrevLine 

FIFO. Decoding can staU in the LBD wWle this FIFO waits to be mied ftom DRAM 

^S^'^f ri,erir '^'"'''"^^^ ^ '^^^'^^^ - '-^•^'^ 

A configuration register in the LBD controls whether the first line being decoded at the start of, h,nH 
the previous line read from the SFU or uses an all O's line instead at the start of a band uses 

The compressed spot data can be read at a rate ofl bit/cycle for pass through mode 1:1 compression 
T^e LBD fimshed band signal is exported to Ae PCU and is additionally available to the CPU as an inter- 



DRAM read 




All FIFOs are 64 bytes 
(twice the ORAM data 
word width) 



FIFO 


next.llne 


FIFO 


1 







prevjine 



HCU 



currjlne 



3- 



ORAM wHte 
DRAM read 



Figure 112. Schematic outline of the LBD and the SFU 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 302 



# 



SoPEC : Hardware Design 



24.2.1 Bi-level Decoding In the LBO 

length encodings. The encoding are listed in Table 103a^5 TM^M "^'^ '^^^^"^ 




St 



ii 



•s « 
5t 



010 



110000 



010000 



100000 
000000 



<RLxRL>100 



Vertical(l): aO <^ bW l, colof ^ fcofor 
Vefticai(>l): aO b1 > l. color 



Vertfcaf(2): aO b W 2, color « t'X::!::r 



VertJcal(-2): aO b1 - 2, color = icolor 



Vertical(3):a0 4>>bW3. color = Icolor 



Vertical^S): aO ^ b1 > 3. color = 



Horizontal: aO aO + <RL> + <rl> 



number of bits, whichever is shorter. The spTci^ r^T^^^ i '''' ' Pre-prognunmed 

followed by pass through. The pass throurescapTc<Se?^^^ always executed as a nm-length code, 
than or equal to 3 1. ^ '''^^ ^ ""^^^ length nm-length with a nin of less 



Table 

13 



104. Run length (RL) encodings 




,<5 . 
o 



ii 



5 

c 

Q> 



RRRRRRRRRR10 



Short White Runlength (5 bits) 



RRRRRRRR10 



RRRRRRRRRRIO 



Medium B(ack Runlengt h {10 brts) 
Medium White Runlength (a bits) 



RRRRRRRRIO 



RRRRRRRRRRRRRRROO 
RRRRRRRRRRRRRRROO" 



Medium Btad< Runlength with RRRRRRRrrr 3, 
^ter pass through ' 



Medium While Runlength with RRRRRRRr 31 
Enter pass through 



Long Black Runlength (15 bits) 



Long White Runlength (1 5 bits) 



the right to most significant bit at theleft^ " ''^^ O^sst significant bit at 

pass the data to the LBD as un-coa,pre«^d daT^"^^" ?" ^^uld be easier to 

mented in the PECl ve«ion of the LBS^S^e^D^Tn^ """t " 1"'^ ^ 
the data stream is an ua-compressed bit/S*: L^S.': c"oL^^;j:t:,t,|J: ^'^^^ 
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To «iter pass ftrough mode the LBD takes advantage of the way mn lengths can be written Usually if one 
of the runleng^h pair .s less than or equal to 31it should be encoded aslshoit ru^^T]Z^^ 1^ 
Ae coding scheme of Table 104 it is stiU legal to write it as a medium or long^Sn ZTbS^J« 
been designed so ftat if a short runlength value is detected in a medium runlen A^^o^e ^e htrizont^ 
command contammg this lunlength is decoded completely this wiU tell the Len to 

either a programmed number of bits or the end of the line which ever comes first Once theTJ«^T ^ 
mode.completed the current color is the same as the color of the last Sth^aSd^^ 

DRAM Access Requirements 

The compressed page store for contone, bi-level and raw tag data is 2 Mbytes. The LBD will access the 
compressed page store in single 256-bit DRAM leads. The LBD wiU need a 156.h\tAn..Ml l ^ ^ 
interface to the DIU. The LBD's DIU bandwidth requireme^^^a^ ^l^d^'^Lle i^S " 

Table 105. DRAM bandwidth requirements 



Direction 



Read 



Maximum number of 
cycJes between eacii 
25643lt DRAM access 



256^ (1:1 compression) 



Pealc Bandwidth 
(bits/cycle) 



1 (1 :1 compression) 



Average Bandwidth 
(bits/cycle) 



: At 1 : 1 compression the LBD requires 1 bit/cycle or 256 bits every 256 cycles 



0.1 (10:1 compression) 
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24.3 Implementation 



24,3.1 Definitions of lO 



Table 106. LBD Port Ust 




Clocks and Resets 



pdk 



Bandstore sfgnais 



SoPEC Functtonal dock. 



In [ Global reset signal. 



cdu_endofbandstore[21 :5] 



cdu_startofbanclstore[21 :5J 



Ibd^ffnishedband 



DIU Interface signals 



17 



17 



In 



In 



Out 



Mdfess of the end of the current band of data 
256-bit word aligned DRAM address. 



Address of the start of the current band of data 
256-bit word aligned DRAM address. 



LBD finished band signal to PCU and Intemipt C^^i^;^ 



lbd_diu_rreq 



lbd_diu_radfI21:5] 



diujbdjrock 



diu_data(63:0] 



17 



64 



Out 



Out 



In 



In 



panied by a valid read address. «««fn 



Read address to DIU 
17 bits wide <25e-bit aligned word). 



Acknowledge from DIU that read request has been" 
aa:epied and new read address can be placed on 



Data from DIU to SoPEC Units. 
Rrat 64-bjts is bits 63:0 of 256 bit word 
^cond 64.bits is brts 127:64 of 256 bit word 
Third 64-bits fs bits 1 91 :128 of 256 bit word 
Fburth 64>blts is bits 255:192 of 256 bit word 




^ady Signal to the PCU. When lbd_pcu^rdy is high it indi-" 
cafes the last cyde of the access. For a write cycle this 
means pcu.da/aouf has been registered by the block and 
tor a read cyde this means the data on tbd_jKu tiatafn is 



Ibd^sfujladvword 



Out 



Ready signal Indicating SFU has previous line data 
available for reading and is also ready to be written 
to. 



Advance line signal to previous and next line buffers 



Advance word signal for previous line buffer. 
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24.3^ Configuration Registers 

Table 107, LBD Conflguratlon Registers 



i3 




Reset 



0x0 



Setup registers (constant for during process lng"7he^^ 



A write to this register causes a reset of 
the LBa 

This register can be read to indicate the 
reset state: 

0 - reset In progress 

1 - reset not in progress 

Writing 1 to this register starts the LBD 
Writing 0 to this register halts the LBD. 
The Go register is reset to 0 by the LBD 
when it finishes processing a band. 
When Go Is deasserted the state- 
machines go to their idle states but all 
counters and configuration registers keep 
their values. 

When Go is asserted all counters are 
reset, but configuraUon registers keep their 
values (l.e. they don't get reset). 
The I^D should only be started after the 
SFU is started. 

This register can be read to determine if 
the LBD Is running 
(1 ■ running, 0 - stopped). 



0x08 



OxOC 



LIneLength 



PassThrough Enable 



16 



0x0000 



Oxt 




Worfc registers ( need to be set up beforB process l^iigTbl^^^T 
NextBandCurrReadAdf(21 :5] 



Width of expanded bi-level fine (in dots) 
(must be a nmjltipfe of 16 bits). 



Writing 1 to this register enables pass- 
through mode. 

WriUng 0 to this register disables pass- 
through mode thereby making the LBD 
compatible wi th FECI. 

Number of dots for which pass-through 
mode win last. If the end of the line is 
reached first then passthrough will be disa- 
t>led. 



0x14 



0x18 



(256-bit aligned DRAM address) 



NextBandUnesRemaining 



17 



15 



0x0000 
0 



0x0000 



Shadow register which is copied to 
CurrfleacMdrwhen (NextBandEnable / 
A Go s= 0). 

NextBandCurrReadAdr is the address of 
the start of the next band of compressed 
bi-level data in ORAM. 



Shadow register which is copied to Unes- 
Remaining when (NextBsndEnabte i & 
Go = 0). 

NextBandUnesRemaining \s the number of 
lines to be decoded in the next band of 
compressed bi-ievel data. 
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Table 107. LBD Configuration Registers 













piexisBnarrevunesouice 


1 


0x0 


Shadow ffrnifttAf VUhirH •» An«<ki«u4 *^ Omu. 

icyi9lBf WlllCn IS vOpieu vO "tPB^* 

UneSource when (NexfBandEnabte j 
<SGo=s0). 

1 - use the previous fine read from the SFU 
for decoding the first line at the start of the 
next band. 

0 - ignore the previous line read from the 
SFU for decoding the first line at the start 
Of the next band (an all 0*s line is used 
instead). 


0x20 


NextBandEnable 


1 


0x0 


tf {NextBandEnable :== 1 & Go s== 0) then 
•N&xtBandCunrReadAdris copied to 
CurrReadAdr, 

•NextBandLinesRemaming is copied 

to LinesRemwning, 
'NextBandPrBvLine Source is copied 

to PrevLineSource, 
'Go is set, 

-NextBandBnable is cleared. 
To start LBD processing NextBandEnabte 
should be set 


Work registers (re 


ad onfy for externai access) ~ ■ 




CurrReadAdrt21:5J 

(256H3it alfgned DRAM address) 


17 




The current 256-bit aligned read address 
within the conrtpressed bi-level image 
(DRAM address). Read only register. 


0x28 


LinesRemaining 


15 




Count of number of lines remaining to be 
decoded. The band has finished when this 
number reaches 0. Read only register 


Qx2C 


PrevLineSource 


1 




1 - uses the previous line read from the 
SFU for decoding the first line at the start 
of the next band. 

0 - ignores the previous line read from the 
SFU for decoding the first line at the start 
of the next band (an all O's line is used 
instead). 

Read only register 


0x30 


CurrWriteAdr 


15 




The current dot position for writing to the 
SFU. Read only register 


0x34 


FirstUneOIBand 


1 




Indicates whether the current line is con- 
sidered to be the first line of the band. 
Read only register 



24.3.3 Starting the LBD between bands 

The LBD should be started after the SFU. The LBD is programed with a start address for the compressed 
b.-levd data^ a decode hneength. the source of the previous line and a count of how many lines to decode. 
The LBD s NextBandEnable bit should then be set (this will set LBD Go). The LBD decodes a single band 
and then stops, clearing it's Go bit and issuing a pulse on Ibdjinishedband. The LBD can then be restarted 
for the next band, while the HCU continues to process previously decoded bi-level data from the SFU. 
There are 4 mechanisms for restarting the LBD between bands: 
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BandPrevLineSource shadow rcffisters anH «Pt M^tu^^^r li . •«^tfwai/Kfi^, ana Next- 
f *u • u registers ana set NextBandEnable to restart the LBD The arfvan 

^^T^ a. NextBa^^k flag befo *a.= ™d 1^^^^^^^ "J! a„ 
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24.3.4 Top-level Description 

A block diagram of the LBD is shown in Figure 1 13. 



ORAM Interface Unit. 

'Ti 



v64 \17 



lossless bMsvttI 
decoder unit 



Stream 
Decoder 



5 



,pass_through_doClenflth 



pass_through_enable 



prevjrne.souj-ce 



Register and 
Resets 



r 



Pnes^remaining 



Imejength 



Comnnand 
Controller 



15- 



tod.,finishedband 



^contnoJ ^ 


r- 


aO , 


Next Edge 
Unit 




A 


L ' 



Line Fill 
Unit 



i Ibd.sfu. 



sfu_a d^ffy 



1 Idb. 



data 



datavatid 



42- 



End of Band 
Unit 



pfadvworjj^ 



Ibd.pldata 



lbd|.sfu,wda^a 



wdatavafic 



Previous 
Line Buffer 



Spot FIFO 
Unit 



Next 
Line Buffer 



Figure 113. Block diagram of lossless bi-level decoder 

The LBD contains the following sub-blocks: 





Registers and 
Resets 


PCU interface and configuration registers. Also generates the Go and the 
Reset signals for the rest of the LBD 


Stream Decoder 


Accesses the bi-fevel description from the DRAM through the DIU inter- 
face, it decodes the bit stream Into a command with arguments, which it 
then passes to the command controller. 


Command Controller 


Interprets the command from the stream decoder and provide the line fill 
unit wtth a limit address and color to fill the SFU Next Une Buffer. It also 
provides the next edge unit starting address to look for the next edge 


Next Edge Unit 


Scans through the Previous Une Buffer using rts current address to find 
the next edge of a color provided by the command controller. The next 
edge unit outputs this as the next current address bade to the command 
controller and sets a valid bit when this address is at the next edge 


Line Fill Unit 


Fills the SFU Next Une Buffer with a color from its current address up to a 
limit address. The color and limit are provided by the command controller 
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In the foUovwng description the LBD decodes data for its cim»ntrf,.r«j-r u 

SFU's next line buffer. wntes this data into the 

Naming of signals and logical blocks arc taken from [18]. 
The LBD is able to staU mid-Lne should the SFU be unahl^ »« 

line frame due to band processing latency ^^^^ * " "^i^ive a current 

All output control signals from the LBD must always be valid after n . 

currendy decoding. /.OA.«^//>.e (to the SFU) S:ilM:^:t^L^Z:^tY ""^^ ""^^ 

24.3.5 Registers and Resets sub-block description 

ten. TT,e n^gistcr descriptioL for L SD^IiSt tITJJt ' ^""^ 

LED. In the case ofTnesReJ:in^;,%^t^;^-^^^ '° •^'^ f"" '^'"''''''^ 

LBD. * numoer is decremented for every hne that is completed by the 

If pass through is used durine a band the fi»ccTJir«..„iiP- li 

LBD ,L previous U'n. mfoZ^. LSt?^."??' " ""I"'" «■« 

line regudleu of the ou. of ,l,e Su ii « is recemng .11 „ros fo, the previous 

pressed data stream. requesting data from the DIU and commence decoding of the com- 



24.3.6 



Stream Decoder Sub-block Description 

.he empty sp.ce created by a"biSl dSl^i!?,. f fr"" ™0 "> «! »P 

^to.e„™Ld,..^e„«':s^^n"t3;t^^srcori=,s'^ 
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A dataflow block diagram of the stream decoder is shown in Figure 1 14. 




EndOfBandStore 



Figure 114. Stream decoder block diagram 



24.3. 6. 1 DecodeC - Decode Command 



^^^^''ci^ff command from bits 6,.0 of the bit stream to output one of three com- 

Z^ef vTf "^"flf P-vides an output to indicat^ how many bUs we^ 

consumed, which feeds back to the barrel shift register. 

There is a fourth conimand. PASSJTHROUGH. which is not encoded in bits 6..0. instead it is inferred in a 

special mnlength. If the stream decoder detects a short runlength value, i.e. a number lift^ 3 1 eneoieS 

^ a medium mnlength this tell the Stream Decoder that once L horizontal comml^d cont "n^g Ss^ 

length ,s decoded completely the LBD enters PASSJTHROUGH rnod^. Following the runlength therein 

"ZlZ TT'T''^' uncompressed data. The LBD will stay in PASS_TH^ufHn^^:^x 

or ft^fne enlr '^"^ ^^^V. this will occur once a programmed number of bits is reached 

or uie line ends, which ever comes first. vo.w.tu 



24.3. 6. 2 DeeodeD - Decode Delta 



The DeeodeD lope decodes the run length from bite 20..3 of the bit stream. If DecodeC is decoding a ver- 

?5 bitnTT ' Tl '''^r^'' ''r' '^"8^ 3 on its output. The out^mSilsa 

15 bit number, which « genendly considered to be positive, but since it needs to only address to 13824 
dots for an A4 page and 19488 dots for an A3 page (of 32.768). a 2-s complement represent^ of -3 2 
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Si 



iwill^work correcUy for thi data pipeline that follows. Tliis «at also outputs how maay bits we« con- 
In the case of PASSJTHROUGH mode, DecodeD parses the bits that »i. 

24.3.6.3 State-machine 

^lt?n ";^*='^"«^'=°"tii"0'«ly fetches consecutive DRAM data whenever there is enough free soace in 

rs^"™' """" " '°"™°* •-^'^ - » 

The RUNLENGTH corxumnd has two different run t«»notKc Th^ i _^t. 

Command Controller Sub-block Descrrption 

«f *« command controller is shown in Figure 1 15. Note that data names such « 



24.3.7 



Doc: SoPEC.hardware.design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 313 



SoPEC : Hardware Design 



J3 




Figure 115. Command controller block diagram 



24.3.7.1 State machine 



The following is an explanation of all the states that the state machine utilizes. 
i START 

This is the state that the Command Controller enters when a hard or soft reset occun or when Go has been 
de-asserted. This state cannot be left until the reset has been removed, Go has been asserted and the NEU 
(Next Edge Unit), the SD (Stream Decoder) and the SFU are ready. 
II AWAITJUFFER 

The NEU contains a buffer memory for the data it receives from the SFU. When the command controller 
enters this state the ;S^£C/ detects this and starts buffering data, the command controller is able to leave this 
state when the state machine in the NEU has entered the NEU^RUNNING state. Once this occurs the com- 
mand controller can proceed to the PARSE state. 
Hi PAUSEJCC 

w^.^"" ^ ^'"^ P""^^'^^^ ^""^ decoder to get starved of data if the 

DRAM IS not able to supply replacement data fast enough. Additionally the SFU can also stall mid-line 
due to band processing latency. If either of these cases occurs the LBD needs to pause until the stream 
decoder gets more of the compressed data stream from the DRAM or the SFU can receive or deliver new 
frames All of the remaining states check if sdvalid goes to zero (this denotes a starving of the stream 
decoder) or xfsfiijbd_rdy goes to zero and that the LBD needs to pause. PAUSE CC is the state that the 
command controller enters to achieve this and it does not leave this state until sdvZlid and sjujbd^rdy are 
both asserted and the LBD can recommence decompressing. 
iv . PARSE 

Once the command controller enters the PARSE state it uses the informarion that is supplied by the stream 
decoder. The first clock cycle of the state sees the sdack signal getting asserted informing the stream 
decoder that the current register infomiation is being used so that it can fetch the next command 



Doc: SoPEC_harc»ware_desrgn 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 314 



SoPEC : Hardware Design 



S5 



When in this state the conmiand controller can receive one of four vaUd conmmds: 
a) Runlength or Horizontal 

ss,~dtr::ss*^ « » ^ 

Should the current line position, aO, be added to the delta anH th*. i* w 
Che a«ent frame being pxocess;d by the L^ Fi^^^^^ 

tnand controller to wait for the Line Fill Unit a mnt^TZ t ^' " n«=«ssaiy for the corn- 

changes into the nrAfTJ^OR^i^lI^ms^McZ^T "^^ ""'^o"" 

When the current line position. aO. and the delta together equal or exceed the LINE r FMrxu u- v • 
programmed during initialisation, then this denotes Sat it is L end ^TfJn^ "NE.LENGTH. which .s 
troller signals this to the rest of the LBD and then r^l^ tVt START ^ 
bj Vertical 

.he current position in die p«"o3^™ for ft! C' "- 'J " " l<>°K« 

biac^ „ is ii:,po,™, ,0 .0^ . J,T AuS trZ^ri'S r^S"" ^" '° 

on the previous li.^ for"^i|?2^rn^^r " , " '>^e 

Skip 

cl^^^L't'Tiste^lLt;^ commands but the color in the cu.e„t Hne is not 

that the command contronerTealtl^re?^^^^^^^^ L'^H^T""^^^ ^^-^^^ 

the current color in diis case. t'ertical{0) commands and has been coded not to change 

d) Pass Through 

Tc^ui^t^^r ot^^^^^ P- ^•-•^ <=yc.e that is uses to construct 

LBD can recommence no^Sfprsion' T'^. ^ ^^"^ ^'^^ ^"«der. the 

color as the last bit in ux^Z7!^":^!Tv^:Z^l T ""^ '"^""^^ ""'^ 
command controller as each oL tSl^h .nl^T' J ^ ''tatc in the 

cessed in one cl^k c^cle ^ ^^-^od^r can always be pro- 

V WAIT_FORJlUNLENGTH 

n'^ro^ge^'af'^ne'S^^^^^^ — that the Line FiU Unit 

clock cycle the command coSler^i^ b^^^ WAJ?"^Tl^^r RUNLENGTH. After the first 

LENGTH has been conZeA JC^rj^L "^^^^ ^'"^^ ^' 

controUer will remm to ^^^E^ " " °^ ^^e command 

w WAIT_FOR_NE 

InXSJt'S^r^— - - - edge in the c.- 

remains here until the edee WAIT_FORJ4E state and 

«tum to the PARSe I^ " "'^^^ «>'n^d controller will 
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VM 



FINISH JJNE 

StISt the command controller needs to hold its data for the SFU before going back to the 

START state. Command controller remains in the FINISH J.INE smt for one clock cycle to achieved? 




Figure 116. State diagram for the Command Controller (CC) state machine 

24.3.8 Next Edge Unit Sub-block Description 

^ A^'^^r ^^^^ responsible for detecting color changes, or edges, in the previous line based 
sJi LdT^uff r ^°'°^^PP"*«*,*»y Co^^and Controller. TT.e NEU is the interface to the 
cZ^^^i V 1 P''<=^°"^J'°« «^«««''^8 ^ «dge- For an edge detect operation the Command 
t^e InH r'^T l^'Tl' f'^''' °f «dge. but it could also be 

%n u ^ k"^- ^^'^"^ « ^'PPlied and using these two values the 

cl^r^^^t'^'^'ir "'^'^ '■""^'^ A^^^/rettL this locaZn t Se 

SirZl S IS T "'^'^ ^" ""^ " ^ ^^"'l *«= Command Con- 

o^Ll^!n^^, «^. . T'^- "^"^^ '^^'^^^^t -^ent line. The 

SSr^M ? '^"''^ " " P.""'''''' ^'^e^ >6 bits in the NEU. In 

this case the NEU wUi request more words from the SFU and will keep searching for an edge. It will con- 
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tinue doing this until it finds an edttc or reaches thP i.nrf 



Command Controller 



Line 
Unit 


^ld_id„color 







15 



stats 
machine 



control 
signals 



15. 



Stream Oaooder 



15 



detect 



prev_Bne a 
prsy.fine b 



16 



buffer 



plbijff_fdy 



^fuJWpWata 

— t: 



16 



jbd^sfu^piai 



Ivfext Edge Unit 



ibdJLsfii.advline 



SFU 



Figure 117. Next Edge Unft block diagram 



24.3.8,1 NEU Buffer 



r.,u^ for a current line is received until it is rented and ^gisterecJ^trer when ^a"r^uTsts"a ^ew' 
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frame it needs it on the next clock cycle to maintain a decoded rate of 2 bits per clock cycle. A more 
detailed diagram of the buffer in the l/EUis shown in Figure 1 1 8. 



16 



16 

uso_prev_lin«_b ^ — 



pl.bufLrdy- 



I 



16 



sfu.lbd.pldata 



pl_buff_rc>y_dly 



Figure 118. Next edge unit buffer diagram 

The output of the buffer are two 16-bit vectors, use^revjine^a and use _j)revjine_b, that are used to 
detect an edge that is relevant to the current line being put together in the Line Fill Unit. 

24.3.B.2 NEU Edge Detect 

The NEU Edge Detect block takes the two 16 bit vectors supplied by the buffer and based on the current 
line position in the current line, aO, and the current color, sd^color, it will detect if there is an edge relevant 
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ally the sa.e a. Lr^Uion.^l, but it <^l::Zi^'S^TrL^'''^- """^^^^^^^ 

I'o^/tS" ldTSi::-^ti:^i!: r"'*'"?^ — "o^.W) are all the traditions ir. the 16 bit 
and inclu^rT^^iTJ^ JV^^^^^^ ^""""^"^ J" ^t-wise tenns all the bits above 
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Th^ decode b block decodes the 4 Isb of the current address (aO) into 16-bit mask bits that control which 
of Ae data bits are examined. Table 109 shows the truth table for this block. 



Table 109. Decode_b truth table 





^^^^^^^ 


0000 


1111111111111111 


0001 


1111111111111110 


0010 


1111111111111100 


0011 


1111111111111000 


0100 


1111111111110000 


0101 


1111111111100000 


0110 


11111111110OOOO0 


0111 


1111111110000000 


1000 


111111110O00000O 


1001 


1111111000000000 


1010 


1111110000000000 


1011 


1111100000000000 


1100 


1111000000000000 


1101 


1110000000000000 


1110 


110O0000O00000O0 


1111 


1000000000000000 



For cases when there is a negative vertical conunand from the stream decoder it is possible that the edge is 
m the three lower significant bits of the next frame. The decode Jb^ext block supplies the mask so that the 
necessary bits can be used by the NEU to detect an edge if present, Table 1 10 shows the truth table for this 
block. 



Table 110. Oecode.b.ext truth table 



VerticaI(-3) 


111 


VerticaI(-2) 


111 


Vertical(-I) 


Oil 


OTHERS 


001 



FIR^_FLU_WRITE is only used in the first frame of the current line. 2.2.5 a) in [1 8] refers to "Processing 
the first picture element", in which it states that "The first starting pichire element, aO, on each coding line 
IS imagmanly set at a position yWr before the first picture element, and is regarded as a white picture ele- 
ment". transmon_wob and transition_btow are set up produce this case for every single frame. However it 
IS only used by the NEU if it is not masked out. This occurs when FIRST J'LUJVRITE is ' 1 • which is only 
asserted at the beginning of a line. 

2 2.5 b) in [1 8] covers the case of "Processing the last picture element", this case states that "The coding of 
the codmg line continues until the position of the imaginary changing element situated after the last actual 
element is coded". This means that no matter what the current color is the NEU needs to always find an 
edge at the end of a line. This feature is used with negative vertical commands. 

The vector, endjrame, is a "one-hot" vector that is asserted during the last frame. It asserts a bit in the end 
of hne position, as determined by UneUngth, and this simulates an edge in this location which is ORed 
with the transition's vector. The output of this, masked_data, is sent into the encodeB one hot block 
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24.3.8,3 £ncode_b_one_hot 

block. i 1 1 lists the truth table outhning the fimctionally required by this 

Table 111. Encode_b_one_hot Ti-uth Table 







XXXXXXXXXXXXXXXXXX 1 


0000000000000000001 


XXXXXXXXXXXXXXXXX10 


0000000000000000010 


XXXXXXXXXXXXXXXX100 


0000000000000000100 


XXXXXXXXXXXXXXX1 000 


0000000000000001000 


XXXXXXXXXXXXXX1 0000 


0000000000000010000 


XXXXXXXXXXXXX1 00000 


0000000000000100000 


XXXXXXXXXXXX1000000 


0000000000001000000 


XXXXXXXXXXXIOOOOOOO 


000000000001 0000000 


XXXXXXXXXX 1 00000000 


0OOOOO000O1 00000000 


XXXXXXXXX1000000000 


0000000001000000000 


XXXXXXXX1 cooooooooo 


000000001 0000000000 


XXXXXXX1 00000000000 


0000000100000000000 


XXXXXX100O000000O0O 


0000001 oooooooooooo 


XXXXXIOOOOOOOOOOOOO 


00000 1 0000000000000 


. XXXX100000000000000 


0000 1 00000000000000 


XXXI ooooooooooooooo 


000 1 ooooooooooooooo 


XX 1 0000000000000000 


00 1 0000000000000000 


X100000000000000000 


01 00000000000000000 


1000000000000000000 


1000000000000000000 


0000000000000000000 


0000000000000000000 



24.3,8.4 £ncode_b_4bit 
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TO^ioSS """^'^ ^^'^ processed. The formula that is implemented to return blp to the command 



for V(n)blp = X -t- n moduluslS 



To^V" '''' '"^^ extracted fron. the -one-hot- vector and n is the vertical 



24.3.B.5 State machine 




Figure 120. State diagram for the Next Edge Unit (NEU) state machine 

The following is an explanation of all the states that the MEU state machine utilizes. 
* NEU_START 

This is the state that NEU enters when a hard or soft reset occurs or when Go has been de-asserted This 
state can not left until the reset has been removed, Go has been asserted and it detects that the command 
controller has entered it's AWAIT JUFF state. When this occurs the NEU enters the NEU^FILL_BUFF 

a NEU^FILLJUFF 

l^u%r^ '''I'^fl^^i^'^ be decoded the NEU needs to fill up its buffer with new data from the 
SFU The rest of the LBDwmts while the retrieves the fin>t four fr^^ Once 
completed it enters the NEUJiOLD state. - 

III NEUJfOLD 
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SLT " "^'^ ''"^^ ^'^•^ ftp- the SFU on the last access 

IV NEUJiUNNmO 

V NEUJShfPTY 

the LBD. uc-^eneo. i ms occurs when the end^ofjme signal is detected from 
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24.3.9 



Line Fill Unit sub-block description 

The Line Fill Unit. LFU, is responsible for filling the next line buffer in th<. <!Ft i n,- cci r 
data in blocks of sixteen bits. The LFU uses the color and XrS by Afc^t^^L/^^^ 

that the data IS valid by strobing the lbdjsfit_wdatamlid signal. to tlic 5FU 

A dataflow block diagram of the line fill unit is shown in Figure 1 19. 



Next 
Edge 
Unit 



Stream 
DeccxJer 



command controller 



hold_sd_co(oi^ 



▼ ^ ^ 



line flu unit 



1— ^ 




< 


^ ► 


vminus.zero i 




4 Urntt 


1— > 


State 


Ifti^state ^ 


command * 


Machine 


color,seL16bftji 


delta 1 






r 






1 ► 


-1— ► 

















Une_fin_data 



work_sfu_wdata 



lbd_sfu_wdata 



ibd_sfu_wdatavaiid 



Jbd_sfii,advling 



SFU 



Figure 121. Uno fill unit block diagram 

The dataflow above has the following blocks: 

24.3.9.1 State Machine 

The following is an explanation of all the states that the LFU state machine utilizes. 
i LFV^START 

This is the state that the LFU enters when a hard or soft reset ocmrc nr «,k^« ^ t, u 
LFUJ^EWJ<EG 
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iii LFU_COMPLETE_REG 

^^fcZxfi^H'^'' f f T^'^ -ot completed in the first 

Clock cycle. The command controller supplies the aO value and the r^i^r anrf fK- * '"^^ 

atbtRm ]ima significan biK of «« and c^<,r_,d_mi, If is , I6.bii wide ™sk ofxJ^tTIt ™ 

rrr„"rjivs;°:^Lidtr.fd£^ 




Figure 122. State diagram for the Line Fill Unit (LFU) state machine 

24.3.9.2 iine^filLdata 

Iire';oII;m^d^ '""'7?' co/<.r..eLy(56/U/valucs and constructs the cuirent frame that 

llTsJU w^^ Imc^filLdata. wor*_^>,Wara is exported by the LBD to the SFU as 



LFU^START) OR. (ifu^state LFU.NEW.REG) -then 
worJc_3fu_wdat« « color_sel^l6bi t_lf 
else 

work_fifu_wdat:a[(15 - limic) downto limit] = 

color.ael.l6bit_lfr(15 - limit) downto limit] 
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25 Spot FIFO Unit (SFU) 



25.1 Overview 

The Spot FIFO Unit (SFU) provides the means by which data is transferred between the LBD and the 
HCU. By abstracting the buffering mechanism and controls from both units, the interface is clean between 
the data user and the data generator. The amount of buffering can also be increased or decreased without 
affecting either the LBD or HCU. ScaUng of data is perfonned in the horizontal and vertical directions by 
the SFU so that the output to the HCU matches the printer resolution. Non-integer scaling is supported in 
both the horizontal and vertical directions. Typically, the scale fector will be the same in both directions 
but may be programmed to be different. 



25.2 Main features of the SFU 

The SFU replaces the Spot Line BuflFer Interface (SLBQ in PECL The spot line store is now located in 
DRAM. . 

The SFU outputs the previous line to the LBD, stores the next line produced by the LBD and outputs the 
HCU read line. Each interface to DRAM is via a feeder FIFO. The LBD interfaces to the SFU with a data 
width of 1 6 bits. The SFU interfaces to the HCU with a data width of 1 bit. 

Since the DRAM word width is 256-bits but the LBD line length is a multiple of 16 bits, a capability to 
flush the last multiples of 16-bits at the end of a line into a 256-b:t DRAM word size is required There- 
fore, SFU reads of DRAM words at the end of a line, which do not fill the DRAM word, will ah-eady be 
padded. 

A signal sfi4jdb_rdy to the LBD indicates that the SFU is available for writing and reading. For the first 
LBD Ime after SFU Go has been asserted, previous line data is not supplied until after the first 
lbd_sju_advhne strobe from the LBD (zero data is supplied instead), and sfi,Jdb_rdy to the LBD indicates 
that the SFU is available for writing. lbd_sju_advline tells the SFU to advance to the next line 
Ibd sJu_pladvword tells the SFU to supply the next 16-bits of previous line data. Until the number of 
lbd^Ju_pladvword strobes received is equivalent to the LBD line length, sju Idb rdy indicates that the 
SFU IS available for bodi reading and writing. Thereafter it indicates the SFU is available for writing The 
LBD should not generate lbdjsfii_pladvv,mrd or lbd_s/u_advline strobes untU sfitjdb_rdy is asserted. 
A si^al sju_hcu_avail indicates that the SFU has data to supply to the HCU. Another signal 
hcu_^/u_advdot. from the HCU, tells the SFU to supply the next dot. The HCU should not generate the 
h^_sfii^advdot signal until sjujxcu_avail is true. The HCU can therefore stall waiting for the 
sfii_hcu_avail signal. 

X and Y non-integer scaling of the bi-levcl dot data is performed in the SFU. 

At 1 600 dp^ the SFU requires 1 dot per cycle for all DRAM channels, 3 dots per cycle in total (read + read 
+ wnte) Therefore the SFU requires two 256 bit read DRAM access per 256 cycles. 1 write access every 
256 cycles. A single DIU read interface will be shared for reading the current and previous lines from 
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25.3 Bi-LEVEL DRAM memory buffer between LBD, SFU and HCU 



i3 




high address 
^ lbd_nextfine_adr 

> lbd_prevline_adf 
hcu_readllne_adr 

> hcu_startreadline_adr 

J low address 





high address 

lbd.nextline_adr 

hcu_readlfne_adr 
lbd_prevline_adr 
hcu_startreadline_adr 

low address 



(b) 



Key: Q 

Free buffer space 

Riled buffer space accessed by LSD Interface FIFOs 
^3 Buffer space read by HCU Read Line FIFO 

n Filled Buffer space read by both HCU Read Line FIFO and LBD interface FIFOs 

Figure 123. Bl-level DRAM buffer 
ZrJl!^'~!!^''~l" <•' *« "-BD p,»l„us line address 

The SFU interfaces to DRAM via three FIFOs: 

a. The HCUReadLineFlFO which supplies dot data to the HCU. 

b. The LBDNextLineFIFO which writes decompressed bi-level data from the LBD. 
cThe LBDPrevLineFIFO which reads previous decompressed bi-level data for the LBD. 

There are four address pointers used to manage the bi-level DRAM buffer: 

».hcu_readUne_adr[21:5] is the read address in DRAM for the HCUReadLineFlFO 

t'S::S;;/;g' -'^ ^ - ^u^ent li„e being read by 
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QJbd_nextline_adr[21:5] is the write address in DRAM for the LBDNextLineFIFO. 
dAbd^revlinejadr[21:SJ is the read address in DRAM for the LBDPrevLineFlFO. 
The address pointers must obey certain niles which indicate whether they are valid: 

a. hcu_readline_adrf2I:5J is only valid if it is reading earlier in the line than 
lbd_nextline^adrf2 1 :5J is wntingU. hif^adrvalid hcu readline adrf2J'5J 
lbd^nextline_adrf2J:5l " " . 

b. The SFU cannot overwrite the current line that the HCU is reading from i.e. hi/ startadrvalid = 
lbd_nextline_adr[21:5] /= hcujstQrtreadline_adr[2l:5]. 

cThe LBDNextLineFIFO must be writing earlier in the line than LBDPrevLineFIFO is reading 
and must not overwrite the current line that the HCU is reading fromi.e. nl/jidn^alid = 
lbd^nextline_adr[21:5] /- lbd^revline^adrf2J:5J AND hcuj5tartreadline_yalid. 

d. The LBDPrevLineFIFO can read right up to the address that LBDNextLineFIFO is writing / e 
pl/adrvalid = lbd^revline_adr[21:5] /- lbdjnextline_adr[21:5], 

e. At startup i.e. when sfii^o is asserted, the pointers are reset to start_sju_adr[21 :5], The first 
LBD NextLineFIFO data is allowed to be written to ibd_nextline__adrf2I:5J even though 
nlf_jidrvalid is initially invalid 

f. The address pointers can wrap around the SFU bi-level store area in DRAM. 

As a guideline, the typical FIFO size should be a minimum of 2 lines stored in DRAM, nominally 3 lines 
up to a programmable number of lines. A larger buffer allows lines to be decompressed in advance. This 
can be useau for absorbing local complexities in compressed bi-level images. 



25.4 DRAM ACCESS REQUIREMENTS 

The SFU has 1 read interface to the DIU and 1 write interface. The read interface is shared between the 
previous and current line read FIFOs. 

The spot line store requires 5.1 Kbytes of DRAM to store 3 A4 lines. The SFU will read and write the spot 
hne store m smgle 256-bit DRAM accesses. The SFU will need 256-bit double buffers for each of its pre- 
vious, current and next line interfaces. 

The SFU's DIU bandwidth requirements are summarized in Table 1 12. 



Table 112, DRAM bandwidth requirements 




1: Two separate reads of I bit/cycle 
2: Write at 1 bit/cycle. 



25.5 SCALING 

Scaling of bi-level data is performed in both the horizontal and vertical directions by the SFU so that the 
output to the HCU matches the printer resolution. The SFU supports non-integer scaling with the scale 
factor represented by a numerator and a denominator. Only scaling up of the bi-level data is allowed i e 
the numerator should be greater than or equal to the denominator. Scaling is implemented using a coiiiter 
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if (count + denominator >= numerator) then 
count = (count + denominator) - nuzneracor 
advance = 1 

else 

count s count ♦ denominator 
advance = 0 



X scaling controls whether the SFU supplies the next dot or a copy of the current dot when the HCIJ 

SFU has supphed an entire HCU line of data, the SFU will either re-read L cuitent line ftom DiSJl or 
advance to the next hne of HCU read data depending on die programmed Y scale factor. 
An example of scaling for numerator^ 7 ^ddenominator = 3 is given in Table 1 13. The signal advance if 
asserted causes the next uiput dot to be output on the next cycle, otherwise the same input dot is outpm 

Table 113. Non-Integer scaling example for scaleNum = 7, scaieDenom = 3 
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2 


0 





25.6 Lead-in and lead-out clipping 

To account for the case where there may be two SoPEC devices, each generating its own pomon of a dot- 
Spp?*'-^^ ! *V I""* "'P''«'*«'l *<= tot^ scale-factor number of times by an individual 

Sle^L^! H t^"" .i! '""^'^ ^. .'^^'o'*'"P ^"'"^^''y ^"'^ ^^'^^^ part of the scaling, one on 
h Z '"^'^ ^'^^ "P °" 8° beyond the HCU line- 

en^h. win be ignored. Scalmg on the lead-in. i.e. of the first valid dot in the line, is controlled by setting 
the XstartCount register. ^ ^ 

At the start of each line c<^nt in the pseudo-code above is set to XstartCount, If there is no lead-in Xstart- 
Count^s set to 0 ue. the first value of count in Table 1 13. If there is lead-in then XstartCount needs to be 
set to the appropnate value of count in the sequence above. 
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25.7 Interfaces between LDB, SFU and HCU 



LBO 



DIU 



lbd_sfu_piac /word 



,shj. 



.!txj^pld[ita^§ 



tt}d_$fu.a(tv n» 



sfujbd_rdi 



lbd_sJu_wdi la 16 



lbd_sfu_wd; tavalid 



lbd_sfu_aclv ine 



DIU IntertecQ 
and 
Address 
Generator 



PrevioLis Line 
FIFO 



pJLrdy 



nlf_rrfy 



NextUne 
FIFO 



Current Une 
FIFO 



SFU 



hcu .sfu^advdot 



sfu. 

-7^ 


hcu_sdata 


sfu. 


hcu avail 
k 



HCU 



Figure 124. Interfaces between LBD/SFU/HCU 



25.7.1 LDB-SFU Interfaces 



The LED has two interfaces to the SFU. The LED writes the next line to the SFU and reads the previous 
line from the SFU. 



25. 7. 1. 1 LBDNextUneFIFO interface 

The LBDNextLineFIFO interface from the LED to the SFU comprises the following signals: 

• lbd_js/u_wdata, 16-bit write data. 

• lbd_sju_wdatavalid, write data valid. 

• Ibdjsfii^advline, signal indicating LDB has advanced to the next line. 

The LED should not write to the SFU until sjujbd^rdy is true. The LED can therefore stall waiting for the 
sjujbd^rdy signal. 



25.7.1,2 LBDPrevUneFIFO interface 

The LBDPrevUneFIFO interface from the SFU to the LED comprises the following signals: 

• sfujbd _j)ldata, 16-bit data. 

The previous line read buffer interface from the LED to the SDU con^rises the following signals: 

• ibd_sJu_j>ladvword^ signal indicating to the SFU to supply the next 16-bit word. 

• lbd^fu_adviine, signal indicating LDB has advanced to the next line. 
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Previous line data is not supplied until after the first IbdjsJu^advHne strobe from the LBD (zero data is 
supplied instead). The LBD should not assert Ibdjs/u^ladvword unless s/ujbd^rdy is asserted. 

25.7.1.3 Common Contra! Signals 

sjujdb^rdy indicates to the LBD that the SFU is avaUable for writing. After the first lbd_sjiijadvline and 
before the number of Ibdjsju^pladvword strobes received is equivalent to the LBD line length, 
sfiijdb^dy indicates that the SFU is available for both reading and writing. Thereafter it indicates the 
SFU is available for writing. 

The LBD should not generate ibd^fii^ladvword or Ibd^Jujadvline strobes until sjujdb^rdy is asserted. 

25.7.2 SFU-HCU Current Line FIFO Interface 

The interface from the SFU to the HCU comprises the following signals: 

• sfiijicu_sdata, 1 -bit data, 

• sfu_hcu_avail data valid signal indicating that there is data available in the SFU HCUReadLine- 
FIFO. 

The interface from HCU to SFU comprises the following signals: 

• hcu_sJu_advdot, indicating to the SFU to supply the next dot. 

The HCU should not generate the hcujsfii_advdot signal until s/u_hcu_avail is true. The HCU can there- 
fore stall waiting for the sfujicu^avail signal. 
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25.8 Implementation 

25.8.1 Definitions of lO 



Table 114. SFU Port List 











ciocKs and Resets 


pdk 




m 


SoPEC Functional dock. 


prst.n 




In 


Global reset signal. 


D\U Read Interface signals 


sfu_diu_rreq 




Out 


SFU requests ORAM read. A read request must be accom- 
panied by a valid read address. 


sfu_diu_radrt21:5] 


17 


Out 


Read address to OIU 

17 bits wide (256-bit afigned word). 


dfu^sfu.rack 




fn 


Acknowledge from DIU that read request has been 
accepted and new read address can be placed on 


diu_data{63:0] 


64 


In 


Data from DIU to SoPEC Units. 
Rrst 64-bits are bits 63:0 of 256 bit word. 
Second 64-blts are bits 127:64 of 256 bit word. 
Third 64-bits are bits 191:126 of 256 bit word. 
Fourth 64-bits are bfts 255:1 92 of 256 bit word. 


dru_sfu_rvalid 


1 


tn 


Signal from DIU telling SoPEC Unit that valid read data is on 
the diujdata bus. 


DIU Write Interface signals ' ~ 


sfu_diu_wrecr 


1 


Out 


SFU requests DRAM write. A write request must be accom- 
panied by a valid write address together with valid write data 
and a write valkj. 


sfu^dlu_wadr(21:5] 


17 


Out 


Write address to DIU 

17 bits wide (256-brt aligned word). 


dfu.sfu^wack 


1 


fn 


Acknowledge from DIU that write request has been 
accepted and new write address can be placed on 

sfu_diu_wacir. 


snj_diu_datal63:0] 


64 


Out 


Data from SFU to DIU. 
Rrst 64-blts are bits 63:0 of 256 bit word. 
Second 64-bits are bits 127:64 of 256 bit word. 
Third 64-bits are bits 1 91 :1 28 of 256 bit word. 
Fourth 64-bits are bits 255:1 92 of 256 bit word. 


sfu_diu_wvalid 


1 


Out 


Signal from PEP Unit indk:atjng that data on sfu^diujdata fa 
vatkl. 


PCU Interface data and control signals 


pcu_addrI5:2] 


4 


In 


PCU address bus. Only 4 bits are required to decode the 
address space for this block 


pcu_dataout(31:0I 


32 


In 


Shared write data bus from the PCU 


sfu_pcu_datajn(31 :0] 


32 


Out 


Read data bus from the SFU to the PCU 


pcu_nvn 


1 


tn 


Common read/not-write signal from the PCU 


pcu_sfu_sel 


1 


In 


Block select from the PCU. When pcu_sfu_$ens high both 
pcu_addr and paj^dataout are valid 
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Table 114. SFU Port List 

















Heady signal to the PCU. When sfu^jcu rdy is high it indi- 
cates the last cycle of the access. For a write cyde this 
means pca.dateouf has been registered by the Wock and 
for a read cyde this means the data on s/iu>cw datasn Is 


LBO Interface Deta and Control S 


Jfinals • 


sfujbd.fdy 


1 


Out 


Signal indication that SFU has previous line data available 
and Is ready to be written to. 


tbd.sfu.advline 


1 


In 


Une advance signal for both next and previous lines. 


lbd_5fuj)ladvword 


1 


In 


Advance word signal for previous line buffer. 


sfujdb_j)ldata(15:0) 


16 


Out 


Data from the previous line buffer. 


Ibd.9fu_wdata{15.-0] > 


16 


In 


Write data for next line buffer. 


lbd_sfu_wdata valid 


1 


In 


Wnte data vaUd signal for next line buffer data. 


HCU Interface Data and Control S 


Ignals 


hcu_sfu_advdot 


1 


In 


Signal Indicating to the SFU that the HCU is ready to accept 
the next dot of data from SFU, 


sfu_hcu_sdata 


1 


Out 


Bi-Jevel dot data. 


sfu_hcu_avall 


1 


Out 


Signal indicating valid bWevel dot data on sfu hcu sdata. 
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25.8.2 Configuration Registers 

Table 115. SFU Configuration Registers 




Control registers 




0x00 


Reset 


1 


0x1 


A write to this register causes a reset of 
the SFU. 

This register can be read to indicate the 
reset state: 

0 • reset in progress 

1 - reset not in progress 


0x04 


Go 


1 


0x0 


Writing 1 to this register starts the SFU. 
Writing 0 to this register halts the SFU. 
When Go ts deasserted the state- 
machines go to their idle states but all 
counters and configuration registers keep 
their values. 

When Go is asserted all counters are 
reset, but configuration registers keep their 
values (f.e. they don't get reset). 
The SFU must be started before the LBO 
is started. 

This register can be read to determine if 

the SFU is running 

(1 - running, 0 - stopped). 


Setup reglstert 


s (constant for during processing the page) 


0x08 


HCUNumDots 


16 


0x0000 


Width of HCU fine (in dots). 


OxOC 


HCUDI^MWords 


8 


0x00 


Number of 256-bit DRAM vwrds in a HCU 
line. 


0x10 


LBDNumWords 


12 


0x000 


Number of 16-bit words in an LBD line. 
(LBD line length must be a multiple of 16 
bits). 


0x14 


StartSfuAdr(21:5J 

(256-bit aligned DRAIW address) 


17 


0x0000 
0 


First SFU location In memory. 


0x18 


EndSfuAdrt21 :5] 

(25j3-bit aligned DRAM address) 


17 


0x0000 
0 


Last SFU location in memory. 


0x1 C 


XstartCount 


8 


0x00 


Value to be loaded at the start of every line 
Into the counter used for scaling in the X 
direction. Used to control the scaling of the 
ftrst dot in a fine. 

This value vwli typically equal zero, except 
in the case where a number of dots are 
clipped on the lead in to a line. 


0x20 


XscaleNum 


8 


0x01 


Numerator of spot data scale factor in X 
direction. 


0x24 


XscaleDenom 


8 


0x01 


Denominator of spot data scale factor in X 
direction. 


0x28 


YscateNum 


8 


0x01 


Numerator of spot data scale factor In Y 
direction. 


0x2C 


YscateDenom 


8 


0x01 


Denominator of spot data scale factor in Y 
direction. 


Work registers (PCU has readonly access) 
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TabtellS.SFU Configuration Registers 



^^^^^^ 










0x30 


HCUReadUheAd r(21 :5] 
(256-bft aligned ORAM address) 






ir 




Current address pointer In ORAM to HCU 
read data. Read only raglster. 


0x34 


HCUStartReadUneAdf(21 :5) 
(256*bit aligned DRAM address) 


17 




Start address In DRAM of line being read 
by HCU buffer in DRAM. Read onty regis- 
ter. 


0x38 


LeDNextUneAdrtaiiS] 
(256-bH aligned ORAM address) 


17 




Current address pointer In DRAM to LBD 
write data. Read only register 


0x3C 


LBDPrevUneAdr[21 :S] 
(256-bit aligned DRAM address) 


17 




Current address pointer In DRAM to LBD 
read data. Read only register 
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25.8.3 SFU 5ub*block partition 



pcu_rwn- 
pcu_sfu_S( 
pcu_addr[5: 
pcu.dataout(3l:0] 
sfu^pcu.data(3 1 :0] ^ | ■ - 
sJu_pcu_rdv^4 




LBD 



slu. 



Jbd_sfi _placfvword 



tod^s u^advllne 



Ibd.rdy 



lbd_si J. 



HCU 



PCU 
Interface 



lW_num_words 
12. 



LBD Previous 
Line FIFO 



plf,fdy 



ptf_diunreq 



pILdiurack 



ptf_dturdata 54 



pJf_dm/vaOd 



pILdiuJdJe 



nlf^rdy 
_%vdata 16 



lbd_s Li_wdaiavalid 



lbd_sfu_advlino 



lbd_num_words 



LBD Next 
Line FIFO 



njf_djuwrftq 



hcu. sfu_advdot 



^ sfu 


■ w 

^lcu_sdata t 


sf 

^ — . — 


_hcu_avaU 



SFU 



HCU Read 
Une FIFO 



hrf_hcu„endofnn« 



hrf^xadvance 



hrf^dlurreq 



hrf_diumck 



hrf^dlurdata 84 



hrt_diurvattd ^ 



hif.dJuidre 





sfu_go (to aa sub-blocks) ^ 






hcu_num_dots 






< 
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start.sfu.adr 








ond_sfu_adr 




^ 
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hc«_readlfr>e_adr 
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hcu.startreadnne.adr 
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fbd_nextnne_adr 


^ 
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(bd_prevnne_adf 
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xstart. 
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xscate.num g 




^ 


xscafe.denom 




^ 


yscale^num 




► 


yscate.ddnom 3 
• ^ 



nlf^diuwack 


¥ 


nlf_djuwdata 54^ 


nlf.dfu¥walid 


k 



DIU 
Interface 
Address 
Generator 

Unit 
(DAG) 



> sfu.dlu.wreq 
>>sfu_d[u_wadr(21:5] 

> sfu.diu.data[63:01 
* sfujdiu.wvand 
•diu_sfu_wack 



sfu.diu^rreq 
>sfu_diu„fadr(21:5) 
diu_sfu_data(63:0J 
diu_stu_rvalld 
diu.sfu.rack 



Figure 125. SFU Sub-Block Partition 
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i3 



The SFU contains a niunber of sub-blocks: 







ruu mierrace 


pcu interlace, configuration and status registers. Also generates the Go 
and the Rasef signals for the rest of the SFU 


LBD Previous Una 
FIFO 


Contains RFO which Is read by the LBD previous line Interface. 


L80 Next Line RFO 


Contains FIFO which is written by the LBD next Une interface. 


HCU Read Une 
FIFO 


Contains FIFO which is read by the HCU interface. 


OtU Interfiace and 
Address Generator 


Contains OIU read Interface and DIU write interfece. Manages the 
address pointers tor the bi-level DRAM buffer. Contains X and Y scallna 
logic. 



25.8.4 



The vanous FIFO sub-blocks have no knowledge of where in DRAM their read or write data is stored. In 
this sense the FIFO sub-blocks are completely de-coupled from the bi-level DRAM buffer All DRAM 
address management is centralised in the DIU Interface and Address Generation sub-block. DRAM access 
IS pre-emprive i.e. after a FIFO unit has made an access then as soon as the FIFO has space to read or data 
to wnte a DIU access will be requested immediately. This ensures there are no unnecessary stalls intro- 
duced e.g. at the end of an LBD or HCU line. 

There now follows a description of the SFU sub-blocks. 

PCU Interface Sub-block 

The PCU interface sub-block provides for the CPU to access SFU specific registers by reading or writing 
to the SFU address space. 
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25.8.5 LBDPrevLineFIFO sub-block 



Table 116: LBDPrevLineFIFO Additional lO Definitions 











Internal Output 




piL«Jy 


1 


Out 


signal indicating LBDPrevLineRFO is ready to be read 
from. Until the first lbd_$fu_advlihe for a band has been 
received and after the number of ibd^sfu jyladvword strobes 
received for a line is equal to LBDNumWords, pif_rxfy Is 
always asserted. During the second and subsequent lines 
piLnfy ts deasserted whenever the LBDPrevUneFiFO Is 
empty. 


Diu and Address Generation aub-block Signals 


plf^diurreq 


1 


Out 


Signal indicating the LBDPrBvLineFtFO has 256-bits of data 
free. 


plf_djurack 


1 


In 


Acknowledge that read request has been accepted and 
pfL<ffurreq should be deasserted. 


plf.diurdata 


1 


tn 


Data from the DIU to LBDPresdJneFlFO. 
Rrst 64-bita are bits 63:0 of 256 bit word. 
Second 64-bits are bits 127:64 of 256 bit word. 
Third 64-b(ts are bits 191:128 of 256 bit word. 
Fourth 64-bits is are 256:1 92 of 256 bit word. 


ptLdJurrvafld 


1 


In 


Signal indicating data on pfCdiurdata is vaiid. 


pILdiuidle 


1 


Out 


Signal Indicating DIU state-machine is In the IDLE state. 



25. a.5. 1 Genera f Description 

The LBDPrevLineFIFO sub-block comprises a double 256-bit buffer between the LBD and the DIU Inter- 
face and Address Generator sub-block. The FIFO is implemented as 8 times 64-bit words. The FIFO is 
written by the DIU Interface and Address Generator sub-block and read by the LBD. 
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LBD 
8fu.Bxlj»tdata ^ 



16 



U>d_sfu_ptadvwofd 



(bd_sfu.advlino 



Ibd_num_words 12 
: — 5^ 



plf^rdy 



64 



8 word 
W-bit FIFO 



read 
"1 — 



wn.te 



y VQad^adr writa^en 



ZERO 



64 



-ptLdiurdata 



wrtto.adr 





plt^dlurreq 






FIFO control 


^ pN-diurvalkl' 


logic 





Figure 126. LBDPrevUneFifo Sub-block 

^ AdHr«t r""^"*^ '° *u r ^ "''^ ^" 256-bits of data from the DIU Interface 

and Address Generation sub-block by asserting a signal p//_rf/«rflc* indicates that Ac request 

has been accepted andp//_</i"/re^ should be de-asserted. cs uiai me request 

'^f^^^ Jl^"'" '"u*" " ^-''•^ °" Plf.diurdata[63:0] over 4 clock cycles The signal 

T f ' increment the FIFO write address, write adrfZ Ol l( the LBD 

Pr^UneFIFO still has 256.bits free then pl/^diurreg should be asserted agaik 

Ser^clii^^JiirM Generation sub-block handles all address pointer management and DIU 

mterfacmg and decides whether to acknowledge a request for data from the FIFO. 



pclk 
plf_diuTTeq 
pICdiurack 
plf_diurvalid 
plf_diurdata[63:0] 




J L 



zi 



Figure 127. Timing of signals on the LBDPrevLlneFIFO Interface to DIU and Address Generator 

The state diagram of the LBOPrevLineFIFO DIU Interface is shown in Figure 128. If sju^o is deasserted 
then die state-machine reftuns to its Jtf/c state. " ••'«»•" ^/"-go is aeassened 
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fBsats=0 



Idle 



dtuidte s 1 



256-bitBfrea In RFQ 
^ Request ^ diurreq = 1, druldle =0 



c 



Ack 



dfurack=s1 
^ diurreq = 0 



^ d{urYalid==1 
^ Datao"^ 



I 



^ Datal J 



divrvalid==1 



^ Data2 



diurYalld==1 



Data3 ^ 



Figure 128. Timing of signals on LBDPrevLineFlFO interface to DIU and Address Generator 

The LBD reads le-bit wide data from the LBDPrevLineFIFO on sjujbd j?ldatafI5:0J. 
lbdjsJu_pladvword from the LBD tells the LBDPrevLineFIFO to supply the next 16-bit word. The FIFO 
control logic generates a signal wordjselect which selects the next 16-bits of the 64-bit FIFO word to out- 
put on sjujbd_pldata[15:0]. When the entire current 64-bit FIFO word has been read by the LBD 
lbd_sfu_pladvword will cause the next word to be popped from the FIFO. 

Previous line data is not supplied until after the first Ibd^sju^advline strobe from the LBD after sju is 
asserted (zero data is supplied instead). Until the first Ibd^sfu^adyline strobe after sfi4_go 
Ibd^sfu _pladvword strobes are ignored. 

The LBDPrevLineFIFO control logic uses a counter, pladvword^countfl I:OJ, to counts the number of 
lbd_sfu^)ladvword strobes received for the line. The pladvword_count counter is reset to 0 by 
lbdjsju_advline and indicates when the number of Ibdjsfu^ladvword strobes received is equal to 
LBDNumWbrds, 

The LBDPrevLineFIFO generates a signal plf^rdy to indicate that it has data available. Until the first 
lbd_sfu_advline for a band has been received and after the number of Ibd _^fujf>ladvword strobes received 
for a line is equal to LBDNumWords^plfjrdy is always asserted. During the second and subsequent lines 
plf^rdy is deasserted whenever the LBDPrevLineFIFO is empty. 

The last 256-bit word for a line read from DRAM can contain extra padding which should not be output to 
the LBD. This is because Ibd^num^words may not fit exactly into a 256-bit DRAM word. When tiie count 
of the number of Ibd^jsjujpladvword strobes received for a line is equal to lbd_numjwords the LBDPrev- 
LineFIFO must adjust the FIFO read address to point to the next 256-bit word boundary in the FIFO. This 
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can be achieved by considering the FITO read address, read adrn OJ will reouire 1 hitc .aa 



if (pladvword.count 
read^adr(l:0] &: bOO 
readLaclr[2] = *rea<aLadr[21 



Ibd^nun^^words) then 



25.8.e LBDNextLineFIFO sub-block 







nlf_rdy 

OIU and Address Generation sub 


T 

Kbiock SJg 


Out 
nals 


1 Signal indicating LBDNextUneHFOls ready to be written to 
j i.e. there is space in the FIFO. 


nJf_dtuwreq 


1 


Out 


1 Signal indicating the LBDNextUneFtFO has 256-bits of data 
for writing to the DIU. 


nlf^diuwack 


1 


In 


Acknowledge from DfU that write request has been 
accepted and write data can be output on n!f diuwdata 
together with ntf_diuwvatid. 


nlf_diuwdata 
nlf__diuwvaltd 


1 
1 


Out 
In 


Data from LBDNextUneFtFO to DiU Interface 
First 64-bits is bits 63:0 of 256 bit word 
Second 64-bit3 is bits 127:64 of 256 bit word 
Third 64-bits is bits 191 :128 of 256 bit word 
Fourth 64«bit3 is bits 255:192 of 256 bit word 








Signal indicating that data on wiLdluwdata is vaNd, 
' - — ~" — — ' ' — i — ^ 



25. B. 6, 1 General Description 

t^f A^H^"'^^^^^ sub-block comprises a double 256-bit buffer between the LBD and the DIU Inter- 
i r o^'^r'"' ^^^O is implemented as 8 Hmes 64.bit words "^e FIFO s 

wntten by the LBD and read by the DIU Interface and Address Generator 
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sfiJ_wdata_rog 



K9tf_6fij_wdata , 



16 



-£0, 



word.sefect / 



a3d_sfu_wvalid 



ibcf_sfu_advlir>o 



tt>d_num_woixto ''2^ 



nifjrdy 



64 



Sword 
64.bitFIFO 



wnte 



read 



write_adr 



wrlta^en 



read.adr 



64 



► nH.diiiwdata 



FIFO control 
logic 



A 


= ^ 

nff.dKiwack 




mfjdtuwvatid 
► 



Figure 129. LBDNextLineFifo Sub-block 

Whenever 4 locations in the FIFO are full the FIFO will request 256-bits of data to be written to the DIU 
Interface and Address Generator by asserting nlfjiiuy^req. A signal /i(/l^/i/wacJt indicates that the request 
has been accepted and nlf^diuv^req should be de-asserted. On receipt of nlf^diuwack, the data is sent to the 
DIU Interface as 64-bits on nlX_diuwdataf63:0J over 4 clock cycles. The signal nlf_diuwvalid indicates 
that the data on nlf_diuwdata[63:0] is valid, nlfjdiuwvalid should be asserted with the smallest latency 
after nlf^diuwack. If the LBDNextLineFIFO still has 256-bits more to transfer then nlf_diuwreq should be 
asserted again 



pclk 
nlf_diuwreq 
nlf_diuwack 
nlf_wdiudata[63:0] 
nlf__diuwvalld 





1 


2 


3 


4 





Figure 130. Timing of signals on LBDNextLineFIFO fnterface to DIU and Address Generator 

The state diagram of the LBDNextLineFIFO DIU Interface is shown in Figure 131. Ifsju^o is deasserted 
then the state-machine returns to its idle state. 
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5 



— »^ Idie ^ 



gSfrhHa in FIFQ 



^ Request ^ diuwreq = 1 
^ Ad< ^ diuwreq a 0 



^ DataO ^ 



(fiuwvalidai 



^ Datal ^ diuwvalid=i 



^ Oata2 ^ 



diuwvaJid=1 



^ Data3 diuwvalid=l 



Figure 131. LBDNextUneFIFO DIU Interface State Diagram 

The signal n// «fy indicates that the LBDNextUneFIFO has space for writing by the LBD The LBD 

^SLa ? ^ T'"'' - '^^OA.vv^ /.OA.wva//^ indic!tes\hat the ^tl is\il1d 

The data is collected to make up a 64-bit word before being written to the FIFO. 

The LBDNextUneFIFO control logic counts the number of lbd_sju^wvalid signals. The Ibd sju wvalid 



25.8.7 sfu_lbd_rdy Generation 



^D^^FIfS"^ ^ ^'""^'^ ANDing from LS£>Pr«,£me/-/FO ^ nlf_rdy from the 

f^T^ff -'■'^ indicates to the LBD that the SFU is available for writing i.e. there is space available in the 
LBDNextUneFIFO. After the first lbd_sJuMne and before the number of L^^^^^Xw siob« 
ece,ved ,s equivalent to the line length. sjujdb_rdy indicates that the SFU is av^2lf f£Z^ reSn" 
Ke^^ere .s data m the LBDPrevLineFIFO. and writing. Thexeafler it indicates the SFuTs lableS 
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25.8.8 LBD-SFU Interfaces Timing Waveform Description 

In Figure 132, sjuj^data^reg is the register in the SFU that registers the data written from the LBD and 
virtuaLnladr is the virtual write address. Siimlarly, lbd^cldata_reg is the register in the LBD that registers 
the data read from the SFU and virtual^ladr is the virtual read address. 

The main points to note from Figure 132 axe: 

• In clock cycle 2 sjujhdjrdy detects that it has only space to receive 2 more 16 bit words from the LBD 
after the current clock cycle, 

• The data on lbd_^Ju^wdata is valid and this is indicated by Ibd^Ju^wdatavcdid being asserted. 
Because the data is pipelined it is not captured in the SFU until clock cycle 3 (one of the two remaining 
spaces in the SFU FIFO). ^"wiiuug 

• In clock cycle 3 sjujbd^rdy is deasserted however the LBD can not react to this signal until clock 
cycle 4. So m clock cycle 3 there is also valid data from the LBD which is captured in clock cycle 4 by 
the SFU thus taking the hist available location available in the FIFO in the SFU. In clock cycle 4 the 
LBD has entered a pause mode and waits for sf\ijbd_rdy to be asserted again. 

• sfiijbd_rdy is asserted in clock cycle 7 (it could be any clock cycle), and this occurs once the SFU can 
has at least 2 16 bit FIFO locations available again. The LBD detects this and on clock cycle 8 it starts 
outputting data by asserting Ibd^sju^wdatavalid and putting new data out which is registered by the 
SFU in clock cycle 9. 

There is an apparent comer case on the read side which should be highlighted. On examination this turns 
out to not be an issue. 

Scenario 1: 

sjujbd_rdy will go low when there is still is still 1 piece of data in the FIFO, If there is a 
lbd^Ju_pladvword pulse in the next cycle the data will appear on sjujbd^idatafl 5:0], 
Scenario 2: 

5jujbd_rdy will go low when there is still 1 piece of data in the FIFO. If there is no lbd^sJu_j,ladvword 
pulse m the next cycle and it is not the end of the page then the SFU will read the data for the next line 
from DRAM and the read FIFO will fill more, sjujdb^rdy will assert again, and so the data will appear on 
sjujbd j}ldaia[15:0]. 

Scenario 3: 

sjujbd^rdy will go low when there is still is still 1 piece of data in the FIFO. If there is no 
lbd_sfu^ladvword pulse in the next cycle and it is the end of the page then the SFU will do no more reads 
from DRAM, sjujbd^rdy will remain de-asserted, and the data will not be read out from the FIFO How- 
ever last line of data on the page is not needed for decoding in the LBD and will not be read by the LBD 
So scenario 3 will never apply. 
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Write from LRntn 5;FTr 




Read from SRJ tn X J\n 




8 j 9 I 10 

Figure 132. Signal waveforms between LBD and SFU 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 345 



SoPEC : Hardware Design 



25.8.9 HCUReadLlneFIFO sub-block 

Table 118. HCUReadLlneFIFO Additional 10 Definition 



OIU and Address Generation sul>-broek Signals 



hrf.jcadvance 


1 


In 


Signal from horizontal scaling unit 
1 - supply the next dot 
1 • suppfy the current dot 


hrf.hcuendoffine 


1 


Out 


Signal lasting 1 cycle Indicating then end of the HCU read 
fine. 


hrf.diurreq 


1 


Out 


Signal indicating the HCUReadUneFIFOhas space for 256- 
bits of DIU data. 


hrf.diurack 


1 


In 


Acicnowtedge that read request has been accepted and 
/i/^Gl!/(/rmQShoutd be de-asserted. 


hrf^diurdata 


1 


In 


Data from HCUReadUneFIFO to DtU. 
First 64-bits are bits 63:0 of 256 bit word. 
Second 64-bits are bits 127:64 of 256 bit word. 
Third 64-bits are bits 1 91 :1 28 of 256 bit word. 
Fourth 64-bits are bits 255:1 92 of 256 bit word. 


hrf_dturvalid 


1 


In 


Signal indicating data on plf^diurxSata is valid. 


hrl.diuidle 


1 


Out 


Signal indicating DIU statenfnachine is in the IDLE state. 



25.8,9.1 Genera! Description 

The HCUReadUneFIFO sub-block comprises a double 256-bit buffer between the HCU and the DIU 
Interface and Address Generator sub-block. The FIFO is implemented as 8 times 64-bit words. The FIFO 
is written by the DIU Interface and Address Generator sub-block and read by the HCU. 



LBD 

sfu_hcu_sdata ^ 



VI 



64 

4 7^ 


8 word 
64-bit FIFO 

read write 


64 




< 7^ 



b«_8el8C« y '3 read.adr 



writa.en 



-hrf_diurdata 



/'3 



wriie^adr 



^slu_hcu_avail 


FIFO control 
logic 


hrf_di\jrreq 


hcu_sfu_advdot 


^ hrf_diurack 


hcu_num_dots le 



hrl.xadvance 


^ tirf_diurvafid 




hrf_hcu,flndofline 
< 







Figure 133. HCUReadUneFifo Sub-block 
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The DIU Interface and Address Generation (DAG) sub-block interface of the HCUReadLineFIFO is iden- 
tical to the LBDPrevLineFIFO DIU interface. 

Whenever 4 locations in the FIFO are free the FIFO will request 256-bits of data from the DAG sub-block 
by asserting hrfjiiurreq, A signal hrfjdiurack indicates that the request has been accepted and hrf^diurreq 
should be de-asserted. 

The data is written to the FIFO as 64-bits on hrf_diurdata[63:0] over 4 clock cycles. The signal 
hrfjdiuryalid indicates that the data returned on hrf_jiiurdataf63:0J is valid, hrfjiiurvalid is used to gen- 
crate the FIFO Mnite enable, write^en^ and to increment the FIFO write address, write jnuir [2:0] . If the 
HCUReadLineFIFO still has 256-bits free then hrfjdiurreq should be asserted agaia 

The HCUReadLineFIFO generates a signal sfujicu^avail to indicate that it has data available for the 
HCU. The HCU reads single-bit data supplied on sfujxcu^sdata. The FIFO control logic generates a sig- 
nal hit^elect which selects the next bit of the 64-bit FIFO word to output on sjujicu_sdata. The signal 
hcu_sfujjdvdot tells the HCUReadLineFIFO to supply the next dot {hrfjicadvance « 1) or the current dot 
{hrf_pcadvance =» 0) on sfujicu^sdata according to the hrf^^cadvance signal from the scaling control unit in 
the DAG sub-block. The HCU should not generate the hcujsJu^ad\dot signal until sfid_hcu_avail is true. 
The HCU can therefore stall waiting for the sfujtcujavail signal. 

When the entire current 64-bit FIFO word has been read by the HCU hcu_sfu_jadvdot will cause the next 
word to be popped from the FIFO. 

The last 256-bit word for a line read from DRAM and written into the HCUReadLineFIFO can contain 
dots or extra padding which should not be output to the HCU. A counter in the HCUReadLineFIFO, 
hcuadvdot_count[lS:0], counts the number of hcujsfujcutvdot strobes received from the HCU. When the 
count equals hcu_num_dots[I5:0] the HCUReadLineFIFO must adjust the FIFO read address to point to 
the next 256-bit word boundary in the FIFO. This can be achieved by considering the FIFO read address. 
read_adrp:0]y will require 3 bits to address 8 locations of 64-bits. The next 256-bit aligned address is cal- 
culated by inverting the MSB of the read_adr and setting all other bits to 0. 

If (hcuadvdot_count == hcu_nuin_<JoCs) then 
read^adr [ 1 : 0 ] = bOO 
read_adrC2} = -read_adrI2) 

The DIU Interface and Address Generator sub-block scaling unit also needs to know when 
hcuadvdot_count equals hcu_num_dots. This condition is exported from the HCUReadLineFIFO as the 
signal hrfjicuendqfline. When the hrfjicuendofline is asserted the scaling unit will decide based on verti- 
cal scaling whether to go back to the start of the current hne or go onto the next line. 

25.8.9,2 DRAM Access Limitation 

The SFU must output 1 bit/cycle to the HCU. Since HCUNumDots may not be a multiple of 256 bits the 
last 256-bit DRAM word on the line can contain extra zeros. In this case, the SFU may not be able to pro- 
vide 1 bit/cycle to the HCU. This. could lead to a stall by the SFU. This stall could then propagate if the 
margins being used by the HCU are not sufficient to hide it. The maximum stall can be estimated by the 
calculation: DRAM service period - X scale factor * dots used from last DRAM read for HCU line. 
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25.8.10 DIU Interface and Address Generator Sub-block 



Table 119. DIU Interface and Address Generator Additional lO Description 









Internal LBDPrevLlneFIFO Inputa 


pILdiurreq 




In 


Signal Indicating the LBDPravUneFiFOhas 256-t)its of data 
free. 


pif.dlurack 




Out 


Acknowledge that read request has been accepted and 

plfjdiiurreq should be de*asserted. 


piLdiurdata 




Out 


Data from the DIU to LBDPrevUneFiFO, 
First 64-bits are bits 63 of 256 bit word 
Second 64-bits are bits 1 27:64 of 256 bit word 
Third 64-bits are bits 1 91 :128 of 256 bit word 
Fourth 64-bfts are bits 255:192 of 256 bit word 


ptf_diuiTvatid 




Out 


Signal indicating data on ptLdiurdata is valid. 


pif.diuidle 




fn 


Signal indicating DIU state-nuichine is in the IDLE state. 


Internal LBDNextLineFIFO Inputs 


nif.diuwreq 




tn 


Signal indicating tf)e LBONaxtLineFIFO ha& 256'blt5 of data 
for writing to the DIU. 


nlf_diuwad( 




Out 


Acknowledge from DIU that write request has been 
accepted and write data can be output on ntt,diuwdata 
together with nff_diuwvatid. 


ntf.diuwdata 




In 


Data from LBDNextLineFIFO to DIU Interface. 
First 64-bits are bits 63:0 of 256 bit word 
Second 64-bits are bits 127:64 of 256 bit word 
Third 64-bits are bits 191:128 of 256 bit word 
Fourth 64-bits are bits 255:1 92 of 256 bit word 


nlf^diuwvalid 




In 


Signal indk^ting that data on wiCdiuwdata is valid. 


Internal HCUReadLineFIFO Inputs 


hrf.hcuendofline 




In 


Signal lasting 1 cycle indicating then end of the HCU read 
line. 


hrf_xadvance 




Out 


Signal from horizontal scaling unit 
1 • supply the next dot 
1 - supply the current dot 


iirf_diurreq 




In 


Signal indicating the HCUReadLineFtFO has space for 256- 
bits of DIU data. 


hrf.diurack 




Out 


Acknowledge that read request has been accepted artd 
hrfjdiurmq should be de-asserted. 


hrf.diurdata 




Out 


Data from HCUReadUneFfFO to DIU. 
First 64-tNts are bits 63:0 of 256 bit word 
Second 64-bits are bits 1 27:64 of 256 bit word 
Third 64-bit$ are bits 191:128 of 256 bit word 
Fourth 64*bjts are bits 255:192 of 256 bit word 


hrf.djurvalid 




Out 


Signal indicating data on plf_diurdata is valid. 


hrf.diuidle 




In 


Signal indicating DIU state-machine is in the IDLE state. 
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25.8. 10. 1 General Description 

The DTU Interface and Address Generator (DAG) sub-block manages the bi-level buffer in DRAM It has a 

LBDNexU,ineFIFO and a DIU Read Interface shared between ^CUR^l 
LineFIFOmA LBDPrevLineFIFO. ^'^nx.vMaa 

All DRAM address management is centralised in the DAG. DRAM access is pre-emptive i.e after a FIFO 
umt has made an access then as soon as the FIFO has space to read or data to write a DIU access wiU be 

'To^ K ""S^res are no unnecessaiy stalls introduced e.g. at the end of an LBD 

or HCU line. 

"'T^Llf^'' horizontal and vertical non-integer scaling logic is completely contained in the DAG 
sub-block. The scaling control unit exports die hlfjuuhance signal to the HCUReadLineFIFO which indi- 
cates whether to replicate the current dot or supply the next dot for horizontal scaling. 

25.8.10.2 DIU Write Interface 

Th^ LBDNextLineFIFO gta^miis all the DIU write interface signals directly except for 
sju_dtu_wadr[21:5] which is generated by the Address Generation logic 

The DIU request from the LBDNextLineFIFO wiU be negated if its respective address pointer in DRAM is 
mvahd i.e. nlf_adrvalid = 0. Tlie implementation must ensure that no erroneous requests occur on 



nM_dluwreq 
ntf_adrvalld. 



& 



wrlt9_r©q 



nlf_dnjwack 



nILditiwdata 64 



nlf_diuwvaf]d 



H 

DIU 

4 Write 



Internee ^ 



shj_dhi_wreq 



dfu_sfii_wack 



64 



y > sfu_dJu_data[63:0) 



sfu_dlu_wvalld 



25.8. 10.3 DIU Read Interface 



Figure 134. DIU Write Interface 



Both HCUReadLineFIFO and LBDPrevLineFlFO share the read interface. If both sources request simul^ 
taneously then the arbitration logic inaplcments a round-robin sharing of read accesses between the HCU- 
ReadLineFIFO and LBDPrevLineFlFO, 

The DIU read request arbitration logic generates a signal, selectjirfplf. which indicates whedier the DIU 
access is from the HCUReadLineFIFO or LBDPrevLineFlFO {^^HCUReadLineFIFO, 1 - LBDPrevLine^ 
^^^^SL,!?^.^^^^!'*''^'' Je/eccAc5^//multiplexing the returned DIU acknowledge and read data to either 
the HCUReadLineFIFO or LBDPrevLineFlFO. 
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hrf_<liuhfle 
per^diuidle 



setecClvfplf 



hrf.dfurack -4- 



ptf.diurack <^ 



hifjdfurdata ^ 



64 



pKjdIurdata ^ 



64 



t^dhijdta 



-diujsfu^ck 



\ 



64 



diu_$fu^data[63:0] 



plCdiurvaikj ^ 



>1 



hrCdiurvaRd ^ 



\ 



- dlu.sfu.rvafid 



Figure 135. DIU Read Interface multiptexfng by select Jtrfpif 

The DIU read request arbitration logic is shown in Figure 136. The arbitration logic will select a DIU read 
request on hrfjdiurreq or plf^diurreq and assert sju^diu^rreq which goes to the DIU. The accompanying 
DIU read address is generated by the Address Generation Logic. The select signal selectJirfplf^W be set 
according to the arbitration winner (O^HCUReadLineFIFO, 1 = LBDPrevLineFIFO). sfu_diu_rreq is 
cleared when the DIU acknowledges the request on diujsfii^rack Arbitration cannot take place again until 
the DIU state-machine of the arbitration winner is in the idle state, indicated by diujdle. This is necessary 
to ensure that the DIU read data is multiplexed back to the FIFO that requested it. 



hrf_dkiwreq^ 
hft_adrvalld 



plf_druwreq 



ptf_adrwfttld , 



& 



& 



diu_sfu_rack . 
diu.idJe 



Read Request 
Arbitration Logic 



2 

history 
> 



busy 



^ seiect.hrfplf 
-►sfu_diu_rreq 



Figure 136. DIU read request arbitration logic 
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. The DIU read requests from the HCUReadLineFIFO and LBDPrevUneFlFO will be negated if their 
respective addresses in DRAM are invalid, hrf_adrvalid = 0 or plf_adrvaUd = 0. The implementaHon mZ 
ensure that no erroneous requests occur on sju_diu_rreq. 

If the HCUReadLineFIFO and LBDPrevLineFIFO request simultaneously, then if the request is not fol- 

F^K"^t"^XT^^" ""^^^'^o" '"g^^ choose die //CC/Wi/ne- 

F/FO by default If there ans back to back requests to the DIU read port then the arbitration Toic 
impkments a round-robin sharing of read accesses between the HCUReadLineFIFO and LEDPrevUne- 

A pseudo-code description of the DIU read arbitration is given below. 

JiltL^is^.iLTresr' HCUReadLineFIFO. plf is LBDP.ev.i„ePIFO 

select^hrfplf = 0 // default choose hrf . 

history = none // no DIU read access immediately preceding 

// state-machine is busy between asserting s£tJL.dii,^rreQ and diu^idls « 1 
t^^^y^r*"* requester state-machine is in idle state then de-assert busy 

If (diu^idle t=- 1) then ^ 
busy 0 

//if acknowledge received from DIU then de-assert Diu request 
if rdiu_sfu_rack =» 1) then 

//de-assert request in response to acknowledge 

sfu_diu_rreq = 0 

// if not busy then arbitrate between incoming requests 
// if request detected then assert busy 
if (busy 0) then 

//if there is no request 

if (hrf^diurreq == 0) AND (plf_diurreq == 0) then 

sfu_diu_rreq a 0 

history a none 
// else there is a request 
else { 

/ / assert busy and request Diu read access 
busy a 1 

sfu_diu_rreq = 1 

// arbitrate in round- robin fashion between the requestors 
// if only HCUReadLineFIFO requesting choose HCUReadLineFIFO 
If (hrf^diurreq == 1) AND (plf^diurreq == 0) then 
history = hrf 
select^hrfplf = 0 
// if only LBDPrevLineFIFO requesting choose LBDPrevLineFIFO 
if (hrf_diurreq =tx O) AND (plf.diurreq =^ 1) then 
history = plf 
select_hrfplf ^ i 
//if both HCUReadLineFIFO and LBDPrevLineFIFO requesting 
if (hrf_diurreq == 1) AND (plf_diurreq 1> then 

// no iinmediacely preceding retjuest choose HCUReadLineFIFO 
if (history == none) then 
history = hrf 
select_hrfplf a 0 
//if previous winner was HCUReadLineFIFO choose LBDPrevLineFIFO 
elsif (history == hrf) then 
history = plf 
seXect_hrfplf » 1 

// if previous winner was LBDPrevLineFIFO choose HCUReadLineFIFO 
elsif (history plf) then 
history = hrf 
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select_hrfplf = 0 
// end there is a request 
) 

25.B.10.4 Address Generation Logic 

The DIU interface generates the DRAM addresses of data read and written by tije SFU*s FIFOs. 

A write request from the LBDNextLineFIFO on nlfjliuwreq causes a write request from the DIU Write 
Interface. The Address Generator supplies the DRAM write address on s/u_diu_wadr[2J:5J, 

A winning read request from the DIU read request arbitration logic causes a read request from the DIU 
Read Interface. The Address Generator supplies the DRAM read address on sfiijdiu_r^dr[21:5]. 



8fu_go 



17^ start,sfu,adf 
\7y end_sfu_adr 



8 V hcu_dram_words 
-7* — 



ntLdkJwack 



plf_djurack 



*)d_sfu_advtine 



Address Generator 



—7^ 


^ * 

sfu.diu_wadrt2l:5} 


-> 


7*- 








hcu_f©ad[ine_adr 






hcu_startreadlin©_adr 


— ► 




ftxS„neoctllne_adr 


— ► 




Ibd_.pr9v11ne_adr 


— ► 




hff_adfvalid 


— > 




hlLstart_adfvaHd 






nff_adfvalid 


-► 




pif.adrvalid 


-> 



Figure 137. Address Generation 

The address generator is configured with the number of DRAM words to read in a HCU line, 
hcu_dram_wordsp:0], the first DRAM address of the SFU area. start_sju_adr[21:5], and the last DRAM 
address of the SFU area, end^Jujctdr[22:5]. 

Address Generation 

There are four address pointers used to manage the bi-levei DRAM buffer: 

a. hcu_readline^adr[21:5] is the read address in DRAM for the HCUReadLineFIFO, 

b. hcu^tartreadlineuadr[2I:5J is the start address in DRAM for the current line being read by 
the HCUReadLineFIFO. 

c. lbd_nextlinejadr[21:S] is the write address in DRAM for the LBDNextLineFIFO. 

d. lbd_j?reviine^adrf2I:5J is the read address in DRAM for the LBDPrevLineFIFO. 
The current value of these address pointers are readable by the CPU. 

Four corresponding address valid flags are required to indicate whether the address pointers are valid: 

a. hlf^adrvalid. 

b. Iilfj5tart_adrvalid. 

c. nlfjadrvalid, 

d. plfjadrvalid. 
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DRAM requests from the FIFOs wiU not be issued to the DIU until the appropriate address flag is valid. 

Chice a request ^bcen acknowledged, the address generation logic can calculate the address of the next 
256-bit word m DRAM, ready for the next request 

Rules for address pointers 

The address pointers must obey certain rules which indicate whether they arc valid: 

a. hcu_readIine_adrf2I:5J is only valid if it is reading earlier in the line than 
lbd_nextline_fldr[21:5] is writing i.e. hlfjadrvalid ^ hcu^readline adrf2J 5I /= 
lbd_next!ine_adrf2J:5l ^ i - j • 

b. The SFU cannot overwrite the current line that the HCU is reading from i.e. hi/jstartadrvalid = 
lbdjnextline_adr[21:5] /= hcujstartr€adline_adr[21:5]. 

c. The LBDNextLineFIFO must be writing earlier in the line than LBDPrevLineFIFO is reading 
and must not overwrite the current line that the HCU is reading fromi.e. nlf_adrvalid = 
lbd_nextline_adr[2l:5] ./= lbd_prevline_adr[21:5] AND hcu^startreadlinej^alid 

d. The LBDPrevLineFIFO can read right up to the address diat LBDNextLineFIFO is writing i e 
plf^adrvalid » lbd^revline_adr[21 :5] /= lbd_nextline_adr[21 :5J. 

e. At startup i.e. when sfu^o is asserted, the pointers are reset to stan_sfii_adr[2l 5] The first 
LBD NextLineFIFO data is allowed to be written to lbd^nextUne_adr[21:S] even though 
nlfjadrvalid is initially invalid. 

f. The address pointers can wrap around the SFU bi-level store area in DRAM. 
X scaling of data for HCUReadLineFIFO 

The signal hcujsju^advdot tells the HCUReadLineFIFO to supply the next dot or the current dot on 
sfiijxcujsdata according to the hrf^xadvance signal from the scaling control unit. When hrfjcadvance is 1 
the HCUReadLineFIFO should supply the next dot. When hrf^xadvance is 0 the HCUReadLineFIFO 
should supply the current dot. 



8^ xstarucount 
^ 


X Scaling Control 
Unit 


hrf_xa<fvance 


o xscale_num 


— » 


— T ► 

— ^ 


hft_hcu_endofrina ^ 


hcu^sfu^advdot 


> 



Figure 138. X scaling control unit 

The algorithm for non-integer scaling is described in the pseudocode below. Note, x^scale^count should 
be loaded with x^tart^count after reset and at the end of each line. The end of the line is indicated by 
hrfjicuendofline from the HCUReadLineFIFO. 

if (hcu_sfu_dotadv == 1) then 

if (x_scal account x_scale_denom - x.scale^num >= 0) then 

x_scale_count = x_scaIe_count ♦ x_scale_denom - x_scale_num 
hrf.xadvance =1 
else 

x_scale_coimt = x„scale_count + x_scale_dGnom 
hrf.xadvance » 0 

else 
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x^&cale.coixnt = 3^scale_count ^ 
hrC^xadvance » 0 

Y scaling of data for HCUReadLineFIFO 

The HCUReadLineFIFO counts the number of hcujsju^advdot strobes received from the HCU When the 
count equals hcu^num^dots the HCUReadLineFIFO will assert hrfjicuendofline for a cycle. 

The algorithm for non-integer scaling is described in the pseudocode below. Note,y^ca/e count should 
be loaded wia yjscalejtenom after reset 



if (hrf_hcu_€ndofline 1) then 

if (y^scale.count + y_scale_denom - y_8cale_nuin >= 0) then 
y_scale_count « y_scale_count ♦ y_scale_denora - y_scale_num 
hrf_yadvance = 1 
else 

y_ficale_counc » y_scale_count ♦ y_scale_denom 
hrf_yadvance = 0 

else 

y_acale„count = y_scale_count 
hrf^yadvance = 0 



8^ yscalfi^num 
^ 


Y Scaling Control 
Unit 


. hrf_yadvar)ce 


8 . yscate.denom ^ 




hrf_hcu_endoffir>e 


^ 



Figure 139. Y scaling control unit 

When the hrfjicuendofline is asserted the Y scaling unit will decide whether to go back to the start of the 
current line, by setting hrf^advance = 0, or go onto the next line, by setting hrf .yadvance = L 

//it end of HCU line and advance to next line 
if (hrf_hcu_endofline == 1) AND (hrf^advance 1> then ( 
//advance to start of next HCU line in DRAM 

hcu^startreadline.adr = hcu_startreadline_adr + hcu.dranuwords 

//allow for address wraparound " 

offset = hcu_startreadline_adr - end_sfu adr 

if (offset >=« 0) then ~ 

^ hcu_starcreadline.adr » start_sf u_adr * offset 

hc:^-.i"eadline_adr = hcu_startreadline_adr 

// if end of HCU line and return to start of current line 
elsif (hrf_hcu_endofline == 1) AND (hrf_yadvanco O) then 
hcu_readline_adr = hcu_startreadline_adr 



Figure 140 shows an overview of X and Y scaling for HCU data. 
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Figure 140. Overview of X and Y scaling at HCU interface 
Address generator pseudo-code: 
Initialization: 

if (sfu_go rising edge) then 

//set flag to allow first write 
init = 1 

//initialise address pointers to start of SPU address space 
lbd_prevline_adr(21:5) = start.sf u_adr (21 : 5 J 
lbd_nextline_adr[2l:5) = start_sfu_adr [21 : 5] 
hcu_readline__adr(21:5] « start.sfu adr(21:5) 
hcu.3tartreodline_adrC21:5) = start_sfu adr(21:5] 
//if first write complete ^ 
elsif <plf_adrvalid == l) then 

// reset flag allowing first write 



init « 0 



Address valid signals: 

hrf^adrvalid = hcu^readline^adr != lbd_nextline_adr 
hrf.startadrvalid = lbd_nextline_adr != hcu^startreadline adr 

Address pointer updating: 

//LBDNextLineFlFO 

aOcnowledge and LBDNextLineFlFO address is valid 
xf (diu_sfu_wack l awd nlf^adrvalid) then 
//if end of SFU address range 
if (lbd_nextline_adr erid_afu_adr> then 
//go to start of SFU address range 
lbd_nextline_adr = start_sfu_adr 
else 

//increment address pointer 
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Ibd^nexcline.adr a lb<3LnGxtline_adr + 1 
// LBDPrevLineFIFO 

//if DIU read oc3tnowledge and LBDPrevLineFlFO address is valid 

if (diu_sfu_rack 1 AND select^hrfplf == i AND pl£_adrvalid==l) then 

if (lbd_prevline_adr == end_8fu_adr) then 
lbd_prevline_adr = start_sfu_adr 

else 

lb<3Lprevline_adr e Ibduprevl ine_adr + 1 

// HCUReadLineFIFO 

//if DIU read acknowledge and HCUReadLineFZFO address is valid 
if (diu_afu_rack == 1 AND select^hrfplf =^ 0 AND hrf_adrvalid-=l) then 
//if end of HCU line and advance to next line 
if (hrf_hcu_endofline =- 1) AND (hrf__yadvance == 1) then { 
//advance to start of next HCU line in DRAM 

hcu_startreadline_adr = hcu_startreadline_adr ♦ hci2^dranL_words 

//allow for address wraparound 

offset a hcu_startreadline_adr - end_sfu_adr 

if (offset >3 0} then 

hcu_startreadline_adr = start.sfu adr ♦ offset 

) 

hcu_readline_adr » hcu_startreadline_adr 

//if end of HCU line and return to start of current line 

elsif (hrf_hcu_endofline == 1) AND (hrf_^advance == 0) then 

hcu_r€adline_adr = hcu_startreadline_adr 
//if pointing to end of SFU address space 
elsif (hcu_readline_adr end_6fu_adr) then 

//go to start of SFU address space 

hcu_readline_adr = start_sf u_adr 
else 

//increment address pointer 
hcu.readline.adr = hcu_readl ine_adr ♦ 1 
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26 Tag Encoder (TE) 



26.1 Overview 



tags on a tnangular gnd. and can be programmed for both landscape and portrait orienXii^ ^ 

o^tiS^T^l TP '•P*- ^''"^ ^°'=°<»**d i°to an arbitrary nmnber 

.n ?^ -I supports mteger scaling in the Y-direction while the TFU supports inS sShS 

;etl^™^606 ?pi " w^^h^ b":ute^ 

The output from the TE is buffered in the Tag FIFO Unit (JFU) which is in turn used as input by the HCU 
n addition, a tejinishedband signal is output to the end of band unit once the innut Z h!, 
loaded from DRAM. TT^e high level data path is shown by the b^g^TifFigTHf. ^" 



DRAM 
intertace 



tag 
encoder 



PCU 



tag FJFO 
unrt 



halftoner / 
compositof 



te.fintshedband 



Figure 141. High level block diagram of TE in context 

b^rS^^^^^ P'^' ^^^^^^"--^^y Pnnted with an infrared^absorptive ink that 

Z^^lTt ^ff N^^^g^eosmg device. Since black ink can be IR absorptive, limited ftmcti^nllity^^ 
provided on offse -pnnted pages using black ink on otherwise blank areas of the paS^- for^x^^^ 
encode buttons. Alternatively an invisible infrared ink can be used to print the posiriSnfags^^^^^^ S 
J^ u T""""^"'^ ""^^ ^en to e^ure that Sy mLTpS^^^^^^ 

I c IS reusea irom FECI . the SoPEC TE over-produces by a factor of 2. 

il?H^^- ^'^''u ""P/'"^ processes 2 dots per cycle, the Ug data interface has 

n^S^ PeS'?^^^^^^^ ™^ -^"^ accoWlishfd in apTr:xrattS 

n«^f«.i ? PECl If the SoPEC TE were to be modified from two dots production per cycle to a 

nommal one dot per cycle it should not lose the 63/52 cycle performance edge attained^LTEC I 
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26.2 



What are tags? 



The first barcode was described in the late 1940's by Woodland and Silver, and finally patented in 1952 
(US Patent 2,612»994) when electronic parts were scarce and very expensive. Now however, with the 
advent of cheap and readily available computer technology, nearly every item purchased from a shop con- 
tains a barcode of some description on the packaging- From books to CDs, to grocery items, the barcode 
provides a convenient way of identifying an object by a product number. The exact interpretation of the 
product number depends on the type of barcode. Warehouse inventory tracking systems let users define 
their own product number ranges, while inventory in shops must be more universally encoded so that prod- 
ucts from one company don^t overlap with prodiicts from another company. Universal Product Codes 
(UPC) were introduced in the mid 1970's at the request of the National Association of Food Chains for 
this very reason. 

Barcodes themselves have been specified in a large number of formats. The older barcode formats contain 
characters that are displayed in the form of lines. The combination of black and white lines describe the 
information the barcodes contains. Often there are two types of lines to form the complete barcode: the 
characters (the information itself) and lines to separate blocks for better optical recognition. While the 
information may change from barcode to barcode, the lines to separate blocks stays constant. The lines to 
separate blocks can therefore be thought of as part of the constant structural components of the barcode. 

Barcodes are read with specialized reading devices that then pass the extracted data onto the computer for 
ftmher processing. For example, a point-of-sale scanning device allows the sales assistant to add the 
scaimed item to the current sale, places the name of the item and the price on a display device for verifica- 
tion etc. Light-pens, gun readers, scanners, slot readers, and cameras are among the many devices used to 
read the barcodes. 

To help ensure that the data extracted was read correctly, checksums were introduced as a crude form of 
error detection. More recent barcode formats, such as the Aztec 2D barcode developed by Andy Longacre 
in 1995 (US patent number US5591956), but now released to the public domain, use redundancy encoding 
schemes such as Reed-Solomon. Reed Solomon encoding is adequately discussed in [24], [26] and [30]. 
The reader is advised to refer to these sources for background information. Very often the degree of redun- 
dancy encoding is user selectable. 

More recently there has also been a move from the simple one dimensional barcodes Qinc based) to two 
dimensional barcodes. Instead of storing the information as a series of lines, where the data can be 
extracted from a single dimension, the information is encoded in two dimensions. Just as with the original 
barcodes, the 2D barcode contains both information and structural components for better optical recogni- 
tion. Figure 142 shows an example of a QR Code (Quick Response Code), developed by Denso of Japan 
(US patent number US5726435). Note the barcode cell is comprised of two areas: a data area (depends on 
the data being stored in the barcode), and a constant position detection pattern. The constant position 
detection pattern is used by the reader to help locate the cell itself, then to locate the cell boundaries, to 
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allow the reader to dctermme the original orientation of the cell (orientation can be determined by the fact 
that there is no 4th comer pattern). 




Figure 142. ExampJe QR Code developed by Denso of Japan 

The number of barcode encoding schemes grows daily. Yet very often the hardware for producing these 
barcodes is specific to the particular barcode format. As printers become more and more embedded, there 
is an increasing desire for real-time printing of these barcodes. In particular, Netpage enabled applications 
require the printing of 2D barcodes (or tags) over the page, preferably in infra-red ink. The tag encoder in 
SoPEC uses a generic barcode format encoding scheme which is particularly suited to real-time printing. 
Since the barcode encoding format is generic, the same rendering hardware engine can be used to produce 
a wide variety of barcode formats. 

Unfortunately the term ''barcode" is interpreted in different ways by different people. Sometimes it refers 
only to the data area component, and docs not include the constant position detection pattern. In other 
cases it refers to both data and constant position detection pattern. 

We therefore use the term tag to refer to the combination of data and any other components (such as posi- 
tion detection pattern, blank space etc. surround) that must be rendered to help hold or locate/read the data. 
A tag therefore contains the following components: 

• data arca(s). The data area is the whole reason that the tag exists. The tag data area(s) contains the 
encoded data (optionally redundancy-encoded, perhaps simply checksummed) where the bits of the 
data are placed within the data area at locations specified by the tag encoding scheme. 

• constant background patterns, which typically includes a constant position detection pattern. These 
help the tag reader to locate the tag. They include components that are easy to locate and may contain 
orientation and perspective information in the case of 2D tags. Constant background patterns may also 

, include such patterns as a blank area surrounding the data area or position detection pattern. These 
blank patterns can aid in the decoding of the data by ensuring that there is no interference between tags 
or data areas. 

In most tag encoding schemes there is at least some constant background pattern, but it is not necessarily 
required by all. For example, if the tag data area is enclosed by a physical space and the reading means 
uses a non-optical location mechanism (e.g. physical alignment of surface to data reader) then a.position 
detection pattern is not required. 

Different tag encoding schemes have different sized tags, and have different allocation of physical tag area 
to constant position detection pattern and data area. For example, the QR code has 3 fixed blocks at the 
edges of the tag for position detection pattern (see Figure 142) and a data area in the remainder. By con- 
trast, the Netpage tag structure (see Fig\ires 143 and 144) contains a circular locator component, an orienr 
tation feature, and several data areas. Figure 143(a) shows the Netpage tag constant backgrotmd pattern in 
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a resoluhon independent fonn. Figure 143(b) is the same as Figure 143(a). but with the addition of the data 
areas to the Netpage tag. Figure 144 is an example of dot placement and rendering to 1600 dpi for a 
Netpage tag. Note that in Figure 144 a single bit of data is represented by many physical output dots to 
form a block within the data area. 





(a) Netpaga tag baeksround pattern (b) Netpaga tag showing data area 

Figure 143. Netpage Ug structure 



Figure 144. Netpage tag with data rendered at 1600 dpi (magnified view) 
26.2.1 Contents of the data area 

The data area contains the data for the tag. 

Depending on the tag's encoding format, a single bit of data may be represented by a number of physical 
pnnted dots. The exact number of dots will depend on the output resolution and the target reading/scan- 
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ning resolution. For example, in the QR code (sec Figure 142), a single bit is represented by a dark module 
or a light module, where the exact number of dots in the dark module or hght module depends on the ren- 
dering resolution and target reading/scanning resolution. For example, a dark module may be represented 
by a square block of printed dots (all on for binary I , or all off for binary 0), as shown in Figure 145. 




Figure 145. Example of 2x2 dots for each block of QR code 

The point to note here is that a single bit of data may be represented in the printed tag by an arbitrary 
printed shape. The smallest shape is a single printed dot, while the largest shape is theoretically the whole 
tag itself, for example a giant macrodot comprised of many printed dots in both dimensions. 

An ideal generic tag definition structure allows the generation of an arbitrary printed shape from each bit 
of data. 

26.2.2 What do the bits represent? 

Given an original number of bits of data, and the desire to place those bits into a printed tag for subsequent 
retrieval via a reading/scanning mechanism, the original number of bits can either be placed directly into 
the tag, or they can be redundancy-encoded in some way. The exact form of redundancy encoding will 
depend on the tag format. 

The placement of data bits within the data area of the tag is directly related to the redundancy mechanism 
employed in the encoding scheme. The idea is generally to place data bits together in 2D so that burst 
errors are averaged out over the tag data, thus typically being correctable. For example, all the bits of 
Reed-Solomon codeword would be spread out over the entire tag data area so to minimize being affected 
by a burst error. 

Since the data encoding scheme and shape and size of the tag data area are closely linked, it is desirable to 
have a generic tag format structure. This allows the same data structure and rendering embodiment to be 
used to render a variety of tag formats. 

26,2.Z 1 Fixed and variable data components 

In many cases, the tag data can be reasonably divided into fixed and variable components. For example, if 
a tag holds Ambits of data, some of these bits may be fixed for all tags while some may vary from tag to tag. 

For example, the Universal product code allows a country code and a company code. Since these bits don*t 
change from tag to tag, these bits can be defined as fixed, and don't need to be provided to the tag encoder 
each time, thereby reducing the bandwidth when producing many tags. 
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Another example is Netpage tags. A single printed page contains a number of Netpage tags. The page-id 
will be constant across all the tags, even though the remainder of the data within each tag may be different 
for each tag. By reducing the amount of variable data being passed to SoPEC's tag encoder for each tag, 
the overall bandwidth can be reduced. 

Depending on the embodiment of the tag encoder, these parameters will be either implicit or explicit, and 
may limit the size of tags renderable by the system. For example, a software tag encoder may be com- 
pletely variable, while a hardware tag encoder such as SoPEC's tag encoder may have a maximum number 
of tag data bits. 

26.2.Z2 Redundancy-encode the tag data within the tag encoder 

Instead of accepting the complete number of TagData bits encoded by an external encoder, the tag encoder 
accepts the basic non-redundancy-encoded data bits and encodes them as required for each tag. This leads 
to significant savings of bandwidth and on-chip storage. 

In SoPEC's case for Netpage tags, only 120 bits of original data are provided per tag, and the tag encoder 
encodes these 1 20 bits into 360 bits. By having the redundancy encoder on board the tag encoder the effec- 
tive bandwidth and internal storage required is reduced to only 33% of what would be required if the 
encoded data was read directly. 



26.3 Placement of tags on a page 

The TE places tags on the page in a triangular grid arrangement as shown in Figure 146. 



Portrait ori«ntation 



dot direction 
► 



Landscape orientatton 



0 



C^i) C^i) 



dot direction 
► 



® 
® ® 
i ® ® 



Unedlractibn 



Line direction 

Figure 146. Placement of tags for portrait & landscape printing 



The triangular mesh of tags combined with the restriction of no overlap of columns or rows of tags means 
that the process of tag placement is greatly simplified. For a given line of dots, all the tags on that line cor- 
respond to the same part of the general tag structure. The triangular placement can be considered as alter- 
native lines of tags, where one line of tags is inset by one amount in the dot dimension, and the other line 
of dots is inset by a different amount. The dot inter-tag gap is the same in both lines of tag. and is different 
from the line inter-tag gap. 

Note also that as long as the tags themselves can be rotated, portrait and landscape printing are essentially 
the same - the placement parameters of line and dot are swapped, but the placement mechanism is the 
same. 
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The general case for placement of tags therefore relies on a number of parameters, as shown in Figure 147. 

— ^ dot direction 



Start Position 



AHTagUne Position 



tag wftMn 
tag's bounding 



Tag width 
< ► 



Dot inter-tag gap 
K-^ ' H 



' tag within > 
tag's t>oundlng 



Une inter-tag gap 



tine direction 




Tag height 



Dot inter-tag gap 

^ • H 



tag within 
tag's t)ounding 
box 



Figure 147. General representation of tag placement 
i^t^Sr"" ^"""""^"^ "^"^^^ ^^^^ ^"^^ P^'n-ters and 

Table 120. Tag placement parameters 









lag neigjit 


The number of dot lines in a tag's bounding box 


minimum 1 


Tag width 


The number of dots in a single line of the tag's bound- 
ing box. The number of dots in the lag Itself may vary 
depending on the shape of the tag. but the number of 
dots in the bounding box will be constant (by defini- 
tion). 


minimum 1 


Dot inter-tag gap 


The number of dots from the edge of one tag's bound- 
ing box to the start of the next tag's bounding box, in 
the dot direction. 


minimum = 0 


Line Inter-tag gap 


The number of dot lines from the edge of one tag's 
bounding box to the start of the next tag's bounding 
box. In the line direction. 


minimum = 0 


Start Position 


Defines the status of the top left dot on the page - Is an 
offset In dot & row within the tag or the inter-tag gap 




AftTagUnePositton 


Defines the status for the start of the alternate row of 
tags, is an offset in dot within the tag or within the dot 
inter-tag gap (the row position is always 0). 





26.4 



Basic tag encoding parameters 

Sfffr*;L?^Sl*^,'7rr T^^ '"frictions on tag encoding parameters a. a direct result of on-chip 
A^o^S, x>,Ir^!^ f " '""'^"^ parameters as well as range restrictions where appropriate 

Although the restnctions were chosen to take the most likely encoding scenarios into account, it is a sim- 
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pie matter to adjust the buffer sizes and corresponding addressing to allow arbitrary encoding Darameters 
in future implementations. *^ 



Table 121, 


Encoding parameters 








W 


page width 


2''* dotpairs or 20,48 inches at 1600 dpi 


s 


tag size 


typical tag size Is 2mm x 2mm 
maximum tag size is 384 dots x 384 dots 
before scaling i.e. 6 mm x 6 mm at 1600 dpi 


N 


number of dots in each dimension of the tag 


384 dots before scaling 


E 


redundancy encoding for tag data 


Reed-Solomon GF(2*) at 5:10 or 7:8 


Of 


size of fbced data (unencoded) 


40 or 56 bits 




size of redundancy-encoded ftxed data 


120 bits 


Ov 


size of variable data (unencoded) 


120 or 112 bits 


Rv 


size of redundancy-encoded variable data 


360 or 240 bits 


T 


tags per page width 


85 packed 6mm x 6mm tags (384 x 384 

dots) will fit in 20.48 inches 



. .wt u»5^> uii a uccu omy oe suppiieo to tne I b once. It can be supplied as 40 or 56 

bits of unencoded data and encoded within the TE as described in Section 26.4.1. Alternatively it can be 
supplied as 1 20 bits of pre-encoded data (encoded arbitrarily). 

The variable data for the tags on a page are those 1 12 or 120 data bits that are variable for each tag Vari- 
able tag data is supplied as part of the band data, and is always encoded by the TE as described in Section 
26.4.1, but may itself be arbitrarily pre-encoded. 

26.4.1 Redundancy encoding 

The mapping of data bits (both fixed and variable) to redundancy encoded bits relies heavily on the 
method of redundancy encoding employed. Reed-Solomon encoding was chosen for its ability to deal with 
burst errors and effectively detect and correct errors using a minimum of redundancy. Reed Solomon 
encoding is adequately discussed in [24], [26] and [30]. The reader is advised to refer to these sources for 
background information. 

In this implementation of the TE wc use Reed-Solomon encoding over the Galois Field GF(2^) Symbol 
size is 4 bits. Each codeword contains 15 4-bit symbols for a codeword length of 60 bits. The primitive 
polynomial isp(x)^x + x -f 1 . and the generator polynomial is g(x) - (;c+a)(jc+a^),..(x+a20, where / - the 
nimiber of symbols that can be corrected. 

Of the 15 symbols, there are two possibilities for encoding: 

• RS(15. 5): 5 symbols original data (20 bits), and 1 0 redundancy symbols (40 bits). The 10 redundancy 
symbols me^ that we can correct up to 5 symbols in error. The generator polynomial is therefore gCc) 
= (x+a)(x+a^).,.(x+a*"). ^ 

• RS(15, 7): 7 symbols original data (28 bits), and 8 redundancy symbols (32 bits). The 8 redundancy 
symbols mean that we can correct up to 4 symbols in error. The generator polynomial is g6c) = 
(x+a)(;c+a'^)...(;c+a^). ^ 

In the first case, with 5 symbols of original data, the total amount of original data per tag is 160 bits (40 
fixed, 120 variable). This is redundancy encoded to give a total amount of 480 bits (120 fixed. 360 vari- 
able) as follows: 
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• Each tag contains up to 40 bits of fixed original data. Therefore 2 codewords are required for the fixed 
data; giving a total encoded data size of 120 bits. Note that this fixed data only needs to be encoded 
once per page. 

• Each tag contains up to 120 bits of variable original data. Therefore 6 codewords are required for the 
variable data, giving a total encoded data size of 360 bits. 

In the second case, with 7 symbols of original data, the total amount of original data per tag is 1 68 bits (56 
fixed, 1 12 variable). This is redundancy encoded to give a total amoxmt of 360 bits (120 fixed, 240 vari- 
able) as follows: 

• Each tag contains up to 56 bits of fixed original data. Therefore 2 codewords are required for the fixed 
data, giving a total encoded data size of 120 bits. Note that this fixed data only needs to be encoded 
once per page. 

• Each tag contains up to 112 bits of variable original data. Therefore 4 codewords are required for the 
variable data, giving a total encoded data size of 240 bits. 

The choice of data to redundancy ratio depends on the application. 



The Tag Format Structure (TFS) is the template used to render tags, optimized so that the tag can be ren- 
dered in real time. The TFS contains an entry for each dot position within the tag's bounding box. Each- 
entry specifies whether the dot is part of the constant background pattern or part of the tag's data compo- 
nent (both fixed and variable). 

The TFS is very similar to a bitmap in that it contains one entry for each dot position of the tag's bounding 
box. The TFS therefore has TagHeight x TagWidth entries, where TagHeight matches the height of the 
bounding box for the tag in the line dimension, and TagWidth matches the width of the bounding box for 
the tag in the dot dimension. A single line of TFS entries for a tag is known as a tag line structure. 

The TFS consists of TagHeight number of tag line structures^ one for each 1 600 dpi line in the tag's 
bounding box. Each tag line structure contains three contiguous tables, known as tables A, B, and C. Table 
A contains 384 2-bit entries, one entry for each of the maximum number of dots in a single line of a tag 
(see Table 121). The actual number of entries used should match the size of the bounding box for the tag in 
the dot dimension, but all 384 entries must be present^ Table B contains 32 9-bit data addresses that refer 
to (in order of appearance) the data dots present in the particular line. All 32 entries must be present, even 
if fewer are used. Table C contains two 5-bit pointers into table B, and is stored in the 10 low bits of the 
next 32-bit word (the upper 22 bits are unused). The total length of each tag line structure is therefore 34 x 
32-bit words. Padding (18 x 32-bit words) is inserted after every 7 tag line structures to keep each tag line 



26.5 



Data structures used by tag encoder 



26.5.1 



Tag Format Structure 



I. This is done so that it is possible to go from one line within a tag to the next by simply adding 33 in 32-bit based addressing to DRAM. 
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structure completely within a 1KByte boundary (thus a TFS containing TagHeight tag Une structures 
requires a TagHeightH rounded up KBytes). The structure of a TFS is shown in Figure 148. 



I 



_ X 



Tag Fdrmat Stnicture 



tag One structure 0 



tag line structure 1 



tag line structure 2 



tag line structure 6 



reserved and unused 
(16x32-bits) 



tag Ifne structure S 



tag line structure n 



\ 



\ 



\ 



\ 



\ 



\ 



tag line structure 



tat>leA 
(384 entries x 2-bits) 
(768 bits) 



table B 
(32 entries xd^) 
(288 bits) 



table C 
(2 entries x 5-b<ts) 

(10 bits) 



reserved and 
unused 
(22bjt5} 



Figure 148. Composition of SoPEC's tag format structure 

A full description of the interpretation and usage of Tables A, B and C is given in section 26.8.3 
414. 



on page 



26.5.1.1 Scaiingatag 

If the size of the printed dots is too small, then the tag can be scaled in one of several ways. Either the tag 
Itself can be scaled by N dots in each dimension, which increases the number of entries in the TFS. As an 
alternative, the output from the TE can be scaled up by pixel replication via a scale factor greater than 1 in 
the both the TE and TFU. 

For example, if the original TFS was 21 x 21 entries, and the scaling were a simple 2 x 2 dots for each of 
the onginal dots, we could increase the TFS to be 42 x 42, To generate the new TFS from the old. we 
would repeat each entry across each line of the TFS, and then we would repeat each line of the TFS. The 
net number of entries in the TFS would be increased fourfold (2 x 2). 

The TFS allows the creation of macrodots instead of simple scaling. Looking at Figure 149 for a simple 
example of a 3 x 3 dot tag, we may want to produce a physically large printed form of the tag where each 
of the onginal dots was represented by 7 x 7 printed dots. If we simply performed replication by 7 in each 
dimension of the onginal TFS. either by increasing the size of the TFS by 7 in each dimension or putting a 
scale-up on the output of the tag generator output, then we would have 9 sets of 7 x 7 square blocks 
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i3 



Instead, we can replace each of the original dots in the TFS by a 7 x 7 dot definition of a rounded dot Fig- 
lire 1 50 shows the results. * 
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Figure 149. Simple 3x3 tag structure 
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Figure 150. 3x3 tag redesigned for 21 x 21 area (not simple replication) 

Consequently, the higher the resolution of the TFS the more printed dots can be printed for each macrodot, 
where a macrodot represents a single data bit of the tag. The more dots that are available to produce a mac- 
rodot. the more complex the pattern of the macrodot can be. As an example, Figure 144 on page 360 
shows the Netpage tag structure rendered such that the data bits are represented by an average of 8 dots x 
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8 dots (at 1600 dpi), but the actual shape structure of a dot is not square. This allows the printed Netpage 
tag to be subsequently read at any orientation. 

26.5.2 Raw tag data 

The TE requires a band of unencodcd variable tag data if variable data is to be included in the tag bit- 
plane. A band of iinencoded variable tag data is a set of contiguous unencoded tag data records, in order of 
encounter top left of printed band from top left to lower right. 

An unencoded tag data record is 128 bits arranged as follows: bits 0-111 or 0-1 19 are the bits of raw tag 
data, bit 120 is a flag used by the TE {TaglsPrinted), and the remaining 7 bits are reserved (and should be 
0). Having a record size of 128 bits simplifies the tag data access since the data of two tags fits into a 256- 
bit DRAM word. It also means that the flags can be stored apart from the tag data, thus keeping the raw tag 
data completely unrestricted. If there is an odd number of tags in line then the last DRAM read will con- 
tain a tag in the first 128 bits and padding in the final 128 bits. 

The TaglsPrinted flag allows the effective specification of a tag resolution mask over the page. For each 
tag position the TaglsPrinted flag determines whether any of the tag is printed or not This jdlows arbitrary 
placement of tags on the page. For example, tags may only be printed over particular active areas of a 
page. The TaglsPrinted flag allows only those tags to be printed. TaglsPrinted is a 1 bit flag with values as 
shown in Table 122. 



Table 122. TaglsPrinted values 







0 


Don't print the tag in this tag position. 

Output 0 for each dot within the tag bounding box. 


1 


Print the tag as specified by the various tag structures. 



26.5.3 DRAM storage requirements 

The total DRAM storage reqxiired by a single band of raw tag data depends on the number of tags present 
in that band. Each tag reqiiires 1 28 bits. Consequently if there are tags in the band, the size in DRAM is 

The maximum size of a line of tags is 163 x 128 bits. When maximally packed, a row of tags contains 163 
tags (see Table 121) and extends over a minimum of 126 print lines. This equates to 282 KBytes over a 
Letter page. 

The total DRAM storage required by a single TFS is TagHeightll KBytes (including padding). Since the 
likely maximum value for TagHeight is 384 (given that SoPEC restricts TagWidth to 384), the maximum 
size in DRAM for a TFS is 55 KBytes. 

26.5.4 DRAM access requirements 

The TE has two separate read interfaces to DRAM for raw tag data, TD, and tag format structure, TFS. 
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The memory usage requirements are shown in Table 123. Raw tag data is stored in the compressed page 
store 



Table 123. Memory usage requirements 



Compressed page store 



Tag Format Structure 



2048 Kbytes 



55 Kbyte (334 dot Qne tags 
9 1600 dp9 



Compressed data page store for BMevet. contone and 
raw tag data. 



65 kB In FECI for 384 dot fine tags (the benchmark) at 
1600 dpi 

2.5 mm tags (1/10th Inch) ® 1600 dpi require 160 dot 

lines = 160/384 x55 or 23 KB 

2.5 mm tags @ 800 dpi require 80/384 x55 n 12 kB 



The TD interface will read 256-bits from DRAM at a time. Each 256-bit read returns 2 times 128-bit tags. 
The TD interface to the DIU will be a 256-bit double buffer. If there is an odd number of tags in line then 
the last DRAM read will contain a tag in the first 128 bits and padding m the final 1 28 bits. 

The TFS interface will also read 256-bits from DRAM at a time. The TFS required for a line is 136 bytes. 
A total of 5 times 256-bit DRAM reads is required to read the TFS for a line with 192 unused bits in the 
fifth 256-bit word A 136-byte double-line buffer will be implemented to store the TFS data. 

The TE*s DIU bandwidth requirements are summarized in Table 124. 



Table 124. DRAM bandwidth requlrernents 




TFS 



Read 



Single 256 bit reads^. TFS is 
136 bytes. This means there 
is unused data in the fifth 
256 bit read. A total of 5 
reads i5 requiicd. 



0.093 



0.093 



1: Each 2min lag lasts 126 dot cycles and requires 128 bits. This is a rate of 256 bits every 252 cycles. 
2: 17 X 64 bit reads per line in PECl is 5 x 256 bit reads per line in SoPEC with unused bits in the last 256-bit read. 



26.5.5 Tag sizes 



SoPEC allows for tags to be between 0 to 384 dots. A typical 2 mm tag requires 1 26 dots. Short tags do not 
change the internal bandwidth or throughput behaviours at all. Tag height is specified so as to aUow the 
DRAM storage for raw tag data to be specified. Minimum tag width is a condition imposed by throughput 
limitations, so if the width is too small TE cannot consistently produce 2 dots per cycle across several tags 
(also there are raw tag data bandwidth implications). Thinner tags srill work, they just take longer and/or 
need scaling. 
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26.6 Implementation 



26.6.1 Tag Encoder Architecture 

A block diagram of the TE can be seen below. 
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Figure 151. TE Block Diagram 

The TE writes lines of bi-level tag plane data to the TFU for later reading by the HCU. The TE is respon- 
sible for merging the encoded tag data with the tag structure (interpreted from the TFS). Y-integer scaling 
of tags is performed in the TE with X-integer scaling of the tags performed in the TFU. The encoded tag 
layer is generated 2 bits at a time and output to the TFU at this rate. The HCU however only consumes I 
bit per cycle from the TFU. The TE must provide support for 126dot Tags (2mm densely packed) with 108 
Tags per line with 1 28bits per tag. 
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The tog encoder consists of a TFS interface that loads and decodes TFS entries, a tag data interface that 
toacb tog raw dato^ encodes .t. and provides bit values on request, and a stote mactoe tl geneS^^^^J^. 
ate addressing and control s.g|tols. The TE has two separate read interfaces to DRAM for raw togX, 
TD. and tog format structure. TFS. * 

It is possible that the raw tag dato interface, the TD. to the DIU could be replaced by a hardware stote 
machine at a ^^^^^S^T^^^^ovid allow flexibiUty in the generation of tags. Support for Y scaling needs 
to be added to the FECI TE. The FECI TE afready allows staUing at its ou^ut during a linl wh« 
tfu^te_ohowrite is deasserted. * 



26.6.2 Y-Scaling output lines 



In order to support scaling in the Y direction the following modifications to the FECI TE are suggested to 
the Tag. Dato Interface, Tag Format Stnicture Interface and TE Top Level: 

• for Tag Dato Interface: program the configuration registers of Table 126,firstTagLineHeight and tao. 
MaxLme with true value i.e. not multiplied up by the scale factor YScale. Within the Tag Dato interface 
there are two counters, countx and county that have a direct bearing on the rawTagDatoAddr genera- 
tion countx decrements as togs are read from DRAM. It is reset to NumTags[RtdTagSense] at start of 
each line of togs, county is decremented as each line of tags is completely read fiom DRAM i.e. countx 
- 0. Scaling may be performed by counting the number of times countx reaches zero and only decre- 
menting county v^h^n this number reaches YScale. This will cause the TagDato Interface to read each 
hnc of tog dato NumTagsflttdTagSenseJ • YScale times. 

• for Tag Format Structure Interface: The implication of Y-scaling for the TFS is that each Tag Line 
Structure is used YScale times. This may be accomplished in either of two ways: 

• For each Tag Line Structure read it once from DRAM and reuse YScale times. This involves gating 
the conttol of TFS buffer flipping with YScale. Because of the way in which this advTfsLine and 
advTagLine related functionality is coded in the FECI TFS this solution is judged to be error-prone. 

• Fetch each TagLineSttucture YScale times. This solution involves controlling the activity ofcurrTf- 
sAddr with YScale. •■' 

In SoPEC the TFS must supply five addresses to the DIU to read each individual Tag Line Stiuc- 
nt^!" "^""'^ «a<=l» °f 5 accesses. This is different from the behav- 

iour in FECI, where one address is given and 17 dato-words were returned by the DIU 
Since the behaviour of the currT/sAddr must be changed to meet the requirements of the SoPEC 
DIU It makM sense to include the Y-Scaling into this change i.e. a count of the number of com- 
pleted sets of 5 accesses to the DIU is compared to YScale. Only when this count equals YScale can 
currTfsAddry^ loaded with the base address of the next lines Tag Line Struchire in DRAM, other- 
wise It 15 re-loaded with the base address of the current lines Tag Line Structure in DRAM. 

. For Top Level: The Top Level of the TE has a counter. LinePos, which is used to count the number of 
completed output lines when in a tag gap or in a line of togs. At the start (i.e. top-left hand dot-pair) of 
a gap or tag LinePos is loaded with either TagGapLine or TagMaxLine. The value LinePos is decre- 
mented at last dot-pair in line. Y-Scaling may be accomplished by gating the decrement of LinePos 
based on YScale value 
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26.6.3 TE Physical Hierarchy 
Tag Encoder 




Top Level FSM 
+ PCU + Comb 
Logic for Muxing 
etc. 



Tag Data Intertece 



Raw Tag Data 
fntermce 



Reed Sofomon 
Encoder 



2D Decoder 



Encoded Tag Data Inteifacc 




encoded 






fixed tag 






data 








encoded 






variable tag 






data 





"lag hormat !>tructure ( I FS) 



Table A 



Rego/p 



Table C 



Table B 



Reg o/p 



Figure 152. TE Hierarchy 

Figure 152 above illustrates the structural hierarchy of the TE. The top level contains the Tag Data Inter- 
face (TDI). Tag Format Structure fTFS). and an FSM to control the generation of dot pairs along with a 
clocked process to cany out the PCU read/write decoding. There is also some additional logic for muxing 
the output data and generating other control signals. 

At the highest level, the TE state machine processes the output lines of a page one line at a time with the 
starting position either in an inter-tag gap or in a tag (a SoPEC may be only printing part of a tag due to 
multiple SoPECs printing a single line). 

If the current position is within an inter-tag gap, an output of 0 is generated. If the current posirion is 
within a tag. the tag format structure is used to determine the value of the output dot. using the appropriate 
enojded data bit from the fixed or variable data buffers as necessary. The TE then advances along the line 
of dots, moving through tags and inter-tag g^s according to the tag placement parameters. 
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I 26.6.4 iO Definitions 



Tabfe 125. TE Port Ust 





mi 


m 


MfQmmmmmmkmmmm 


CIOCK8 and Resets 




pdk 


1 


In 


SoPEC Functional dock. 


piston 


1 


In 


dotal reset signal. 


Bandstore Signals ' " 


cdu_endofbancistore[21 :5] 


17 


tn 


Address of the end of the current band of data. 
256-bit word aligned ORAM address. 


cdu_startofbandstore[21 :5] 


17 


fn 


Address of the start of the current band of data. 
256-bit word aligned DRAM address. 


te_finishedband 


1 


Out 


TE finished band signal to PCU and ICU. 


PCU Interface data and control signals 


pcu.Bddft8:2] 


7 


In 


PCU address bus. 7 bits are required to decode the address space 
for this block. 


pcu_dataout(31 :0] 


32 


In 


Shared write data bus from the PCU. 


te_pcu_datain[31 :0] 


32 


Out 


Read data bus from the TE to the PCU. 


pcu_rwn 


1 


In 


Common read/not-write signal from the PCU. 


pcu_lo_8el 


1 


In 


Block select from the PCU. When p«y_feLSe/i3 hr^ both 

pcu^addr and pcu_dataout ate valid. 


le_pcu_rdy 


1 


Out 


Ready signal to the PCU. When te^pcu_rdy is high it indicates the 
last cycle of the access. For a write cycle this means pcujdataout 
has been registered by the bk>ck and for a read cycle this means 
the data on te,j)cu^datain is valid. 


TD (raw Tag Data) DIU Read Interface signals 


ld_diu_rreq 


1 


Out 


TD requests DRAM read. A read request must be accompanied by 
a valid read address. 


td_diu_radr(21:5] 


17 


Out 


TD read address to DIU. 

17 bits wide (256-bit aligned word). 


diu«td_rack 


1 


In 


Acknowledge from DIU that TD read request has been accepted 
and new read address can be placed on t0_diu_^fadr. 


dtu_data[63:0] 


64 


In 


Data from DIU to TE. 
First 64-blts are bits 63:0 of 256 bit word; 
Second 64-bits are bits 127:64 of 256 bit word; 
Third 64-bits are bits 1 91 :128 of 256 bit word; 
Fourth 64-bits are bits 255:1 92 of 256 bit word. 


diu_td_fvalid 


1 


In 


Signal from DIU telling TD that valid read data is on the diu^data 
bus. 


TFS (Tag Format Structure) DIU Read Interface signals 


tfs_diu_rreq 


1 


Out 


TFS requests DRAM read. A read request must be accompanied 
by a valid read address. 


tfs_diu_fadf(21:5] 


17 


Out 


TFS Read address to DIU 

1 7 bits wide (256-bit aligned word). 


diu_lfs_rack 


1 


In 


Acknowledge from DIU that TFS read request has been accepted 
and new read address can be placed on tfs^diu_radr. 
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Ibblel 25. TE Port List 







Mi 




d(u_cSata[63:0] 


64 


tn 


Data from OIU to TE. 
Rrst 64-t}it3 are bits 63.-0 of 256 bit word; 
Second 64-bits are bits 1 27:64 of 256 bit word; 
Third 64-bits are bits 1 91 : 1 28 of 256 bit woitl: 
Fourth 64-bits are bHs 255:192 of 256 bit word. 


diu__tfs_rvand 


1 


in 


Signal from DIU telling TFS that vafld read data Is on the diujdata 
bus. 


TFU Interface data ancf control signals 


tfu_te_oktowrile 


1 


In 


Ready signal indicating TFU has space avallabfe and is ready to be 
written to. Also asserted from the point that the TFU has redeved 
its expected number of bytes tor a line until the next 

te^tiu^wradviine 


te.tfu^wdataptO] 


8 


Out 


Write data for TFU. 


te_tfu_wdatavalid 


1 


Out 


Write data valid signal. This signal remains high whenever there is 
valid output data on fa_f/i/.wcra!a 


te_tfu_wradv1ln0 


1 


Out 


Advance Une signal strobed when the last byte In a Gne is placed 
on te^tfujwdata 



26.6.5 Configuration Registers 

The configuration registers in the TE are programmed via the PCU interface.Refer to section 21.8.2 on 
page 257 for the description of the protocol and timing diagrams for reading and writing registers in the 
TE.Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register reads 
and writes the lower 2 bits of the PCU address bus arc not required to decode the address space for the 
TE.Table 126 lists the configuration registers in the TE. 

Registers which address DRAM are 64-bit DRAM word aligned as this is the case for the PECl TE. 
SoPEC assumes a 256-bit DRAM word size. If the TE can be easily modified then the DRAM word 
addressing should be modified to 256-bil word aligned addressing. Otherwise, software should program 
these the 64-bit word aligned addresses on a 256-bit DRAM word boundary- 



Table 126. TE Configuration Registers 









^^^^^ 




Control reg 


sters 








0x00 


Reset 


1 


1 


A write to this register causes a reset of the TE. 
This register can be read to indicate the reset state: 

0 - reset in progress 

1 - reset not in progress 


0x04 


Go 


1 


0 


Writing 1 to this register starts the TE. Writing 0 to this 
register halts the TE. 

When Go is deasserted the state-machines go to their 
Idle states but all counters arKi configuration registers 
keep their values. 

When Go is asserted all counters are reset, but con- 
figuration registers keep their values (I.e. they don't 
get reset). NextBandEnabia is cleared when Go is 
asserted. 

The TFU must be started before the TE is started.. 
This register can be read to determine if the TE Is run- 
ning (1 = running, 0 = slopped). 
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Table 126.TE ConfiguraUon Registers 




Setup regtetew (constant for processing of a page) 



1 0x40 
1 0x44 


TfeStartAdf 
(64-bltalioned ORAM 
address • should start at 
a 256-bft aligned loca- 
tion) 


19 


0 


Pbints to the first word of the first TFS line in ORAM. 


1 0x48 


TfsEndAdr 

(64-bit aligned DRAM 
address • should start at 
a 256-bit aligned loca- 
tion) 


19 


0 


Points to the first word of the fast TFS line In DRAM. 




TfsBrstLlneAdr 
(64-ba aligned DRAM 
address) 


19 


0 


Points to the first word of the first TFS line to be 
encountered on the page. If the start of the page is rn 
an inter-tag gap. then this value wfif be the same as 
TFSStartAdr since the first tag line reached will be the 
top line of a tag. 


0x4C 


OataRedun 


j 1 


0 


Defines the data to redundancy ratio for the Reed 
Solomon encoder. Svmtjoi size is atwav« a. hito rw^A 
word size is always 15 symbols (60 bits). 

0 - 5 data symbols (20 bits), 10 redundancy symbols 
(40 bits) 

1 -7 data symbols (28 bits). 8 redundancy symbols 
(32 bits) 


0x50 


Decode2DEn 


1 1 


n 

u 


Determines whether or not the data bits are to be 2D 
decoded rather than redundancy encoded (each 2 
bits of the data bits becomes 4 output data bits). 

0 = redundancy encode data 

1 = decode each 2 bits of data into 4 bits 


0x54 


VariabieData Present 


1 


0 


Defines whether or not there is variable data in the 
tags. If there is none, no attempt is made to read tag 
data, and tag encoding should only reference fixed 
tag data. 


0x56 


EncodeFixed 1 


1 


0 


Determines whether or not the lower 40 (or 56) bits of 
fixed data should be encoded into 120 bits or simply 
used as is. 


0x5C 


TagMaxDotpairs [ 


8 


0 


The width of a tag in dot-pairs, minus 1 . 
MininrKim 0, Maximum=191. 


1 0x60 


TagMaxUne 1 


9 


0 


The number of lines in a tag, minus 1 . 
Minimum 0, Maximum = 383, 


1 0x64 


TagGapOot 


14 


0 


The number of dot pairs between tags in the dot 

dimension minus 1 . 

Only valid If TagGapPres0ni[bi{ 0] - i 


0x68 
I 0x6C 


TagQapUne 

DotPairsPerUne | 


14 
14 


0 
0 


Defines the number of dotlines between tags in the 

line dimension minus 1 . 

Only valid if TagGapPrBsentbiXi] = 1. 


1 0x70 


DotStartTagSense i 


2 


0 - . 


Number of output dot pairs to generate per tag line. 
Determines Ibr the first/even (bit 0) arid second/odd 
(bit 1) rows of tags whether or not the first dot position 
of the line is in a tag. 
1 g in a tag, 0 g in an inter»tag gap. 
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Tabic 126. TE Configuration Registers 




0x74 



0x78 



0x60 to 
0x84 



0x88 to 
0x8C 



TagGapPresent 



YScale 



OotStartPos 



NumTags 



2x14 



2x8 



Bit 0 Is 1 if there Is an inter-tag gap in the dot dimen- 
sion, and 0 if tags are tightly packed. 
Bit 1 is 1 if there is an inler-tag gap in the line dimen- 
slon, and 0 if tags are tightly packed. 



Tag scale factor In Y directwn. Output lines to the TFU 
will be generated YScale times. 



Determines for the first/even (0) and second/odd (1) 
rows of tags the numt>er of dotpaJrs remaining minus 
1 . in either the tag or inteMag gap at the start of the 
Une. 



Determines for the first/even and second^odd rows of 
tags how many tags are present in a Une (equals 
number of tags minus 1 ). 



Setup band related registers 



OxCO 



0xC4 



0xC8 



OxCC 



NextSandStartTagDa- 
taAdr 

(64-brt aligned DRAM 
address • should start at 
a 2564)it aligned loca- 
tion) 



NextBandEndOfTagOata 
(64-bit aligned DRAM 
address) 



NextBandRrstTagUne- 
Height 



NextBandEnable 



Holds the value of StartTagDataAdr for the next band. 
This value is copied to StartTagDataAdr when 
DoneBand is 1 and NextBandEnable is 1, or when Go 
transitions from 0 to 1. 



Holds the value of EndOfTagData for the next band. 
This value is copied to EndOfTagData when 
DoneBand Is 1 and NextBandEnable is 1, or when Go 
transitions from 0 to 1. 



Holds the value of FirstTagUneHeight for the next 
band. This value is copied to RrstTagtineH eight when 
DoneBand gets is 1 and NextBandEnable is 1 . or 
when Go transitions from 0 to 1 . 



When NextBandEnable is 1 and DoneBand is 1 , then 
when te.finishedband is set at the end ol a band: 
-NextBand StartTagDataAdr Is copied to StartTagDa- 
taAdr 

-NextBandEndOfTagData is copied to EndOfTagData 

-NextBandFirstTagUneHeight is copied to RrstTa- 

gLineHetght 

-DoneBand Is cleared 

-NextBandEnable is cleared. 

NextBandEnable is cleared when Go is asserted. 



Readonly band related registers 
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Table 126. TE Configuration Registers 







IIP 


mm 








1 


0 


w|jeuaiv» wneiner uie lag data intenace has finished 

loading an the tag data for the band. 

It Is of eared to O wirtien Go transitions from 0 to 1. 

When the tag data interfece has finished toading all 

the tag data fbr the band, the te^fyiisheaband signal 

is given out and the DoneBand flag is set. 

If NextBandEnabto isl at this time then startTagDa- 

taAdr, endOfTagOata and fifStTagUneHeight are 

updated with the values lor the next band and 

DoneSandis deared. Processing of the next band 

starts immecGately. 

If NextBandEnabfe is 0 then the remainder of the TE 

wnr continue to run,, while the read control unit walls 

for /Vextaand&iaWdto be set before it restarts. Read 
only. 


0xO4 


StartTagOataAdr 
(64-bit aligned ORAM 
address - shouid start at 
a 256-bit aligned loca- 
tion) 


19 


0 


The start address of the cunent row of raw tag data. 
This Is inltfally points to the first word of the band's tag 
data, which should be aligned to a 128-bit boundary 
(i.e. the lower bit of this address shouki be 0). Read 
only. 


1 0x08 


EndOfTagData 
(64-btt aligned ORAM 
address) 


19 


0 


Points to the address of the final tag for the band. 
When alt the tag data up to and including address 
enaOrragOata has been read in. the te^finishedband 
signal is given and the doneBand flag is set. Read 
only. 


OxOC 


FfrstTagUneHeight 


9 


0 


The number of lines minus 1 in the first tag encoun- 
tered In this band. This will be equal to TagMaxUne if 
the band starts at a tag boundary. Read only. 


1 Work regist< 


srs (set before starling the TE and nnust not be touched between bands) 


0x100 


UnelnTag 


1 


0 


Determines whether or not the first line of the page is 
in a line of tags or in an inter-tag gap. 
1 * in a tag, 0 - in an inter-tag gap. 


0x104 


UnePos 


14 


0 


The number of lines remaining minus 1, in either the 
tag or the inter-tag gap in at the start of the page. 


j 0x110 to 
j OxIIC 


TagOata 


4x32 


0 


This 128 bit register must be set up initially with the 
fixed data reconJ for the page. This is either the lower 
40 (or 56) bits (and the eococTe/Txec^ register should 
be set), or the lower 1 20 bits (and encodedFixed 
should be dear). The tagOataio] register contains the 
rower 32 bits and the tagData{3J register contains the 
upper 32 bits. 

This register is used throughout the tag encoding 
process to hold the next tag's variat)le data. 


1 Work registe 
1 Read-only frc 


rs (set Internally) ' 

>m the point of view of PCU register access 


0x140 


OotPos 


14 


0 


Defines the number of dotpairs remaining in either the 
tag or inter-tag gap. Does not need to be setup. 


1 0x144 


CurrTagPlaneAdr 


14 


0 


The dot-pair number t>eing generated. 


0x148 


DotsinTag 


1 


0 


Determines whether the current dot pair is in a tag or 
not 

1 - In a tag, 0 - In an inter-tag gap. 
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Tabte 126. TE Configuration Registers 



^^^^ 




¥i 




















0x14C 


TagAltSense 


1 


0 


Determines whether the production of output dots Is 
for the first (and subsequent even) or second (and 
subsequent odd) row of tags. 


0x154 


CurrTFSAdr {64-bit 
aligned ORAM address) 


19 


0 


Points to the start next line of the TPS to be read in. 


0x158 


ReadsRemaining 


4 


0 


Number of reads remaining in the current burst from 
the raw tag data Interface 


0x1 5C 


CountX 


8 


0 


TTie number of tags remaining to be read (rrtinus 1) by 
the raw tag data intertace for the current line. 


0x160 


CounlY 


9 


0 


The number of times (minus 1) the tag data for the 
cun-ent line of tags needs to be read in b/ the raw tag 
data interface. 


0x164 


RtdTagSense 


1 


0 


Detemiines whether the raw tag data interface is cur- 
rently reading even rows of tags (==0) or odd rows of 
tags (=1) with respect to the start of the page. Note 
that this can be different from tagAltSense since the 
raw tag data Interface is reading ahead of the produc- 
tion of dots. 


0x168 


RawTagDataAdr 
(64-tHt aligned ORAM 
address) 


19 


0 


The current read address within the unencoded raw 

tag data. 



- - o ' X icvci ana inc i c suo-oiocKs. 1 nis IS achieved 

by including wnte decoders in the sub-blocks as well as the top level, see Figure 153. In order to perform 
reads the subHblock registers are fed to the top level where the read decode is carried out on all the PCU 
accessible TE registers. 



control 
pcu_dataout[31 :01 - 





t — ^decode j 



sub*block 



top level 



te_pcu_datain[31 :0J 



te_pcu_rdy 



Figure 153. Block diagram of PCU accesses 
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26.6.5.1 Starting the TE and restarting the TE between bands 

The TE must be started after the TFU. 

For the first band of data, users set up NextBandStartTagDataAdr, NextBandEndTagData and NextBand- 
FirstTagLineHeight as well as other TE configuration registers. Users then set the TE's Go bit to start pro- 
cessing of the band. When the tag data for the band has finished being decoded, the tejtnishedband 
interrupt will be sent to the PCU and ICU indicating that the memory associated with the first band is now 
firee. Processing can now start on the next band of tag data. 

In order to process the next band NextBandStartTagDataAdr, NextBandEndTagData and NextBandFirst- 
TagLineHeight need to be updated before writing a 1 to NextBandEnahle, There are 4 mechanisms for 
restarting the TE between bands: 

a. tejinishedband causes an interrupt to the CPU. The TE will have set its DoneBand bit. The 
CPU reprograms the NextBandStartTagDataAdr, NextBandEndTagData and NextBandFirstTa- 
glineffeight registers, and sets NextBandEnabie to restart the TE. 

b. The CPU programs the TE's NextBandStartTagDataAdr, NextBandEndTagData and NextBand- 
FirstTagLineHeight registers and sets the NextBandEnabie flag before the end of the current 
band. At the end of the current band the TE sets DoneBand. As NextBandEnabie is already 1, 
the TE starts processing the next band immediately. 

cThe PCU is progiammed so that tejinishedband triggers the PCU to execute commands from 
DRAM to reprogram the NextBandStartTagDataAdr, NextBandEndTagData and Next- 
BandFirstTagLineHeight registers and set the NextBandEnabie bit to start the TE processing 
the next band. The advantage of this scheme is that the CPU could process band headers in 
advance and store the band commands in DRAM ready for execution. 

d.This is a combination of ^> and c above. The PCU (rather than the CPU in b) programs the TE*s 
NextBandStartTagDataAdr, NextBandEndTagData and NextBandFirstTagLineHeight registers 
and sets the NextBandEnabie bit before the end of the current band. At the end of the current 
band the TE sets DoneBand and pulses tejinishedband. As NextBandEnabie is already 1, the 
TE starts processing the next band immediately. Simultaneously, te Jinishedband triggers the 
PCU to fetch commands from DRAM. The TE will have restarted by the time the PCU has 
fetched commands from DRAM. The PCU commands program the TE next band shadow reg- 
isters and sets the NextBandEnabie bit. 

After the first tag on the page, all bands have their first tag start at the top i.e. NextBandFirstTagLineHeight 
= TagMaxLine, Therefore the same value of NextBandFirstTagLineHeight will normally be used for all 
bands. Certainly, NextBandFirstTagLineHeight should not need to change after the second time it is pro- 
grammed. 
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I 26.6.6 TE Top Level FSM 

The following diagram illustrates the states in the FSM. 

Reset QRGo«g=»0 



0 



TagDotLine 



while producing valid tag lines 



Figure 154. Tag Encoder Top-Level FSM 

At the highest level, the TE state machine steps through the output lines of a page one line at a time, with 
the starting position either in an inter-tag gap (signal dotsintag - 0) or in a tag (signals tfsvalid and tdvalid 
and lineintag = 1) (a SoPEC may be only priming part of a tag due to multiple SoPECs printing a single 
line). 

If the current position is within an inter-tag gap, an output of 0 is generated. If the current position is 
within a tag, the tag format structure is used to determine the value of the output dot, using the appropriate 
encoded data bit from the fixed or variable data buffers as necessary. The TE then advances along the line 
of dots, moving through tags and inter-tag gaps according to the tag placement parameters. 

Table 127 highlights the signals used within the FSM. 
Table 127. Signals used within TH top level FSM 







pcik 


Sync clock used to register all data within the FSM 


prst^n, te^feset 


Reset signals 


advtagline 


1 cycJes pulse indicating to TO! and TFS sub-blocks to move onto the next line of 
Tag data 


cuirdotlineacf r(1 3:0] 


Address counter starting 2 perk ahead of cun^tagplaneadr to generate the correct 
dotpair for the current line 


dotpos 


Counter to Identify how many dotpairs wide the tag/gap is 


dotsintag 


Signal identifying whether the dotpair are in a tag(1 )/gap(0) 


lineintag^temp 


Identical to iineintag but generated 1 pdk eariier 


linepos^shadow 


Shadow register for Unepos due to tinepos being written to by 2 different proc- 
esses 


talaltsense 


Rag which alternates between tag/gap iines 


te^state 


FSM state variable 


teptanebuf 


6-bit shift register used to format dotpairs into a byte for the TFU 


wradvline 


Advance Eine stgnaf strobed when the last byte in a line is placed on te^tfu^wdata 
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Due to the 2 system clock delay in the TFS (both Table A and Table B outputs are registered) the TE FSM 
IS workmg 2 system cloclc cycles AHEAD of the logic genexating the wnt data forCmi ^ a 
the foUowmg control signals had to be single/double registered on the system dock. ^ » 



dotsintag 
tdvalid ■ 
tfsvalid ■ 
tfu_ok_write - 
lineintag^temp - 



A. 



pclk 



dotsintag] 


tdvalid I 




tfsvalid 1 


► 


tfu_ok_writel 


> 


lineintagl ^ 


► 





A 



-►dotsintag2 
-►tdvalid2 

■>tfsvalid2 
■♦•tfu_ok_write2 



Figure 155. Generated Control Signals 

The tag_dotJine state can be broken down into 3 dififerent stages. 

Stageli. The state tag_dotJine is entered due to the go signal becoming active. This state controU the 
wnnng of dotbytes to the TFU. As long as the tag line buffer address is L equal to the do^TsZune 
register value z^A ^_te_ohowrite is active, and there is valid TFS and TD available or taggaps. dotpairs 

plied to the TFU since the TFU is a FIFO rather than the line store used in PECl . 

Wule generating the dotline of a tag/gap line (Uneintag flag = 1) the dot position counter dotpos is decre- 
mented/reloaded (with tagmaxdotpairs or taggapdot) as the TE moves between tags/gaps. The dotsintag 

^r'^X TT '"^''^"PJ ' ' " ™^ <^«>°tin"^ *e end of a do3 nf 

approaches (currdotlineadr = dotpairsperline). 

fJ^""" '^('^.fy'^'tl «f A« dotline the iineintag and tagaltsense signals must be prepared 
for the next dothne be It in a tag/gap dotline or a purely gap dotline. 



Stage2:. At this point the end of a dot line is reached so it is time to decrement the linepos counter if still in 
aug^gap row or reload the linepos register, dotpos counter and reprogram the dotsintag flag if going on o 

c^J/JT^^''' Ti"" T '^^ -^•^'"P -^^^ «Ws register is updlted a 

cycle early ,n order for Ae real register to get its correct value while switching between dot lines and tag 

rZlZ u nT" '"'"'^ ^^^"^ ''"9"" = 0 •'"d of a tag/gap has been 

reached, when /ine;,o. = 0 the end of a tag row is reached. This stage uses the signals linefnJag tempZd 

to^a/tte/we which were generated one jy^teiHc/oc/t cycle earlier in Stage 1. «- " 

' the^^^R J?!? implements the writing of dotpairs to the correct part of the 6-bit shift register based on 

, *''i^.^*°'^'^'^'''fiPf'!"*'^^'^'»''=0'^Pi«'"ents the counterfort^^ 

foftLl'ZT " ^l^'^^g --'"SpWarfr (dotpairsperline - 1). All the qiSifier signals e.g 

for this stage are delayed by 2 system clock cycles i.e. the currtagplaneadr (which is the internal writt 

J^r^^'T "^l^/"^' incremented until the d^^airs are Mailable wWch rs^iays 2 
Jyj^gw clock cycles later tha n when currdotlineadr is incremented. ^ ^y^ ^ 
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The wradvline and advtagline pulses are generated using the same logic (currently separated in the PECl 
Tag Encoder VHDL for clarity). Both of these pulses used to update further registers hence the reason they 
do not use the delayed by 2 system clock cycle qualifiers. 

Combinational Logic 

The TDI is responsible for providing the information data for a tag while the TFSI is responsible for decid- 
ing whether a particular dot on the tag should be printed as background pattern or tag information. Every 
dot within a tag's boundary is either an information dot or part of the background pattern. 



TDI 



IdLetdO 



tdLetdl 



tdi_tagisprinted 



TFS 
Inteitace 



tfsLta_dotO[01 



tfsLta_ck)t1(1l 



tf$ijta_dot1[01 



On 



^dots[1] 



dotsintag 

Figure 156. Logic to combine dot Information and Encoded Data 

The resulting lines of dots are stored in the TFU. 

The TFSI reads one Tag Line Structure (TLS) from the DIU for every dot line of tags. Depending on the 
current printing position within the tag (indicated by the signal tagdotnum), the TFS interface outputs dot 
information for two dots and if necessary the corresponding read addresses for encoded tag data. The read 
address are supplied to the TDI which outputs the corresponding data values. 

These data values {tdi_etdO and tdi_etdl) are then combined with the dot information {tfsijtajdotO and 
tfsi^ta_dotl) to produce the dot values that will actually be printed on the page {dots), see Figure 1 56. 



lastdotintagi 



dotsintag 
tf svalid ^ 
( dvaiid 
dQt{2S£ 




dotpairsperjine 



Figure 157. Generation of Lastdotintag/1 

The signal lastdotintag is generated by checking that the dots are in a tag (dotsintag = I) and that the dot- 
position counter dotpos is equal to zero. It is also used by the TFS to load the index address register with 
zeros at the end of a tag as this is always the starting index when going from one tag to the next, lastdotin- 
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uxg is gated with arfvte^/m« in the TFSi (Table C) where a<h_tfsjine pulse is used to update the Tahl. r 
address reg for the new tag line - this is because tastdctintag occuTs a cycle earlS^ i ? 

would r«ult in the wrong Table C value for the last dotplur. Ji^rLV ^ i^^VS'S 
(etd_sw.tch state) to pulse the eul_advu,g signal hence switching buffers if the ETD^or the nLlSi 

tTSln ^^l'^'^""^^ ^ to. /«rA><toto^ except it is combinatorially generated (1 cycle earlier 

^ lastdotintag, except at the end of a *a^/0,e). lastdotimagi signal is only Lid in the TDiTo res^th^ 
tdvahd signal on the cycle when dotpos = 0. Note the UNSlGliED(currdotJineJ^) = tjNS GNeS2 

/orrrfoftn/agi^en process as diis is an combinatorial process. ^ as m me 



dotsintagi 



tfsvaliHI 



trivfllirii Ty L oaic \ dotposvalid 
linetntaal ^ ^ ) ► 



tQ ti bi nktnwritol ^ 




Figure 158. Generation of Dot Position Valid 

•nie dotposvalid signal is created based on being in a tag line (lineintagj = 1). dots being in a taK 
(dotsmtagl = 1). having a valid tag fonnat strucnire available (tf:svalidl = n and haljnnnr!!?!^^ ^ . 

Hfe toToad tt°S '^^'^'^ ^ "•'"^ ""^^'^^ '"'^ Jsignal is uTed^'^ 



dotsintag- 
tfsva[id2 - 
tdvalid2- 
currtagplaneadr 



J13:2] 




Logic 



te_tfu_we 



. te_tIbL\vradr 



Figure 159. Generation of wrfte enable to the TFU 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 383 



SoPEC : Hardware Design 




The signal tejtjuj^datavalid can only be active if in a taggap or if valid tag data is available {tdvalid2 and 
tfsvalid2) and the currtagpplaneadr(\ :0) equal 11 i.e. a byte of data has been generated by combining four 
dotpairs. 



tagmaxdotpairs 



tagdotnum 




Figure 160. Generation of Tag Dot Number 

The signal tagdotniun tells the TFS how many dotpairs remain in a tag/gap. It is calculated by subtracting 
the value in the dotpos counter from the value programmed in the tagmaxdotpairs register. 
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J3 



26.7 Tag Data Interface (TOi) 



26.7.1 I/O Specificatfon 

Table 128. TOI Port List 



Clocks and Resets 




pdk 


In 


SoPEC system dock 


prsLn 


In 


ActiveHow. synchronous reset in pdk domain. 


DJU Read Interface Signals 




diu_data[63K)] 


In 


Data from DRAM. 


td_diu_rreq 


Out 


Data request to DRAM. 


td_d!u_radft21:SJ 


Out 


Read address to ORAM. 


dlu_td_rack 


Jn 


Data acknowledge from ORAM. 


d'u^td^rvalld 


In 


Data valid srgnal from DRAM. 


PCU Interface Data, Control SIgna 


Is and 


pcu_dataout[31:0) 


In 


PCU writes this data. 


pcu_addr[8:2] 


(n 


PCU accesses this address. 


pcu_rwn 


In 


Globaf read/write-not signal from PCU. 


pcu_te_sel 


In 


PCU selects TE for r/W access. 


pcu_t©_reset 


In 


PCU reset 


td_te_doneband 

td.te^dataredun 

W-te_decode2den 

td_te_variabledatapresent 

td_te_encodefixed 

^^te.numtagsO 

td_te_numtagsl 

td_te_starttagdataadr 

td_te.rawtagdataadr 

td_te_endoftagdata 

td_te_firsttagJineheight 

td.ie.tagdataO 

td_te.tagdata1 

td_te.tagdata2 

td.te_tagdata3 

td_te_countx 

td_te_county 

td_te_rtdtagsense 

td_te_readsremaining 


Out 


PCU readable registers. 


TFS (Tag Format Structure) ' — 


tfsLadfOf8:0] 


In 1 


Read address for dotO 


tfsi_adr1[8:0] 


In 1 


Read address for doti 


Bandstore Signals ' " 


cd u_starlofbandstore[24 :0] 


In 


Start memory area allocated for page bands 


cdu_endofbands!ore[24:0J 


fn 


Last address of the memory allocated tor page bands 


te.finishedband 


Out 


Tag encoder band finished 
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te.Onishedband 




^ tdVafjd 
lastOottnT&g 
la$tOotlnT^g1 



^ taglsPrinted 



etdRdAdrO 



eWRdAdfl 



Figure 161. TDI Architecture 



26.7.2 Introduction 

The tag data interface is responsible for obtaining the raw tag data and encoding it as required by the tag 
encoder. The smallest typical tag placement is 2min x 2mm, which means a tag is at least 126 1600 dpi 
dots wide. 

In PECl. in order to keep up with the HCU which processes 2 dots per cycle, the tag data interface has 
been designed to be capable of encoding a tag in 63 cycles. This is acmally accomplished in approximately 
52 cycles within PEC 1. For SoPEC the TE need only produce one dot. per cycle; it should be able to pro- 
duce tags in no more than twice the time taken by the PECl TE. Moreover, any change in implementation 
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ftom two dots to one dot per cycle should not lose the 63/52 cycle perfonnance edge attained in the FECI 
As shown in Figure 162, the tag data interface contains a raw tag data interface FSM that fetches t«a H,f» 
Tol^tt^d'tTdllSorS^^^^^^^ 

St oS bl^i^'*^* ""^ ^^"^ "^""^ raLe«corf^erf. TE_dataredun and TE_decode2den 

■ $1^'^^ "l! coding where every 5 input symbols arc used to produce 1 5 output symbols, so the output is 

3 times the size ofthe input. This can be performed on fixed and variable tag data 
' • ^'^r.^'^y 7 "P"' "y^'l'ok are used to produce 15 output symbols, so for the same 

num^r of mp,U symbols, the output is not as laige as the (15.5) code (for more details see section 
26.7.6 on page 400). This can be performed on fixed and variable tag data. 

' and'^vSe* teg JIS''' ^ ^""^ ^'"^ ''"^"^ ^ "^^^ performed on fixed 

' ax^°l?a Tnir ^^^^ Invtrtio,. This can be perfonned on 

Each tag is made up of fixed tag data (i.e. this data is the same for each tag on the page) and variable tae 
data (i.e. different for each, tag on the page). variaoie tag 

rS/whfnS?.?*!!'- ^ '^^-^''^ ^^"^ ^«xi«d "° coding is required). 

t^oL^f?. T''"'* °' ^^-^'^ (•'•^^ « Once the fixed tag data 

IS coded It IS 1 20-bits long. It is then stored in the Encoded Tag Data Interfece. 

The variable tag data is stored in the DRAM in uncoded form. When (15.5) coding is required the 120- 

J^^d^nrZ^^r t"'° 240-b,ts. When 2D decoding is required the 120-bits stored in DRAM a^e con- 
verted into 240-b.ts. In each case the encoded bits are stored in the Encoded Tag Data Interface. 
The encoded fixed and variable tag data are eventually used to print the tag. 

riL^rT^l^H^ " i^^? ^"<T'' P^S^- " «"<=od^d as necessary and 

is then stored m one of the 8xl5-b.ts registers/RAMs in the Encoded Tag Data Interface This data remains 
unchanged m the registers/RAMs until the next page is ready to be processed. 

The 120-bits of unencoded variable tag data for each tag is stored in four 32-bit words. The TE re-reads 

?lT,po u\ : f ''^'"'^ ^^^"^ ^« •« P^°d"<=«» that tag. The variable tag 

data FIFO which reads from DRAM has enough space to store 4 tags >a>»ie rag 
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I 26.7,3 Data Flow 

An overview of the dataflow through the TDI can be seen in Figure 162 below. 



ENCODED TAQ DATA tKTERFACE 

-Encoded ftxad data can bv up to lao bits tong 
-Um 2 txiftef3 U> anow tor 2 simuttaneoualy 
READS in ona cycta. 

•TYteM stOTM hold iha fixed tag data lor 1 taa. 
-Total memory - 120x2 - 240 bits 



RAW TAQ DATA INTERFACE 
B*84 



TAQ DATA REGISTER 




REED SOLOMONS 
DECODE 20 



-Tho requested lag is READ 
«n«> this IZS-bit buffer. 
'This buffer can be updated 
up to 1 &3 times/line. 
•Each lag wiu be loaded 
at least 126 times. 



-min dotOag 126 (spedRed) 
>max dots/line • 1600x12.8 » 20460 
-max taga/Une » 20480^126 » 163 
-max variabTe data/tag - 120 
-max amount of tag data/iSna » 120 x 164 
-Split tf\8 120 tag data bits into 2xe4-bit$ (8 spare oita) 
-Max memory needed for 1 line of tag data » 2x64k164 - 656x32 
•Divide this in half to aflow for 8imultanoo*J9 READ/WRITE 
•Once aJ( thia data is loaded it wia be valid for at least 126 Jlnes. 
•From the specification, we mu&i be aiaie to process 2 dotafcyclei 
-126 lines contains 20480x126 - 258O480 dots. 
•Therefore the data wil be updated at most every 1290240 cycies, 
-Totaf memory - 164x2x64 • 20992-bita 
-The store uses d-bil addressing. Bit-9 indicates which buffer. 
•Once printing has started eacfi half buffer has 1/2 a line in wNch to be loaded 
i.e. tor a 1 2.8 tncfi line it has 10240 dots or 5120 cycfes 
lor an 8 Inch line it has 6400 dots or 3200 cycles 




•Have to be able to read one tag^ data 
fmm the Raw Tag Data Interlace. RS 
encode and store it in the Encoded Tag 
Data Interlace in 63 cycles or less. 



-Encoded vsriable data can t)e up to 360 bits long 
-Use 2 txjff era to allow for 2 simultaneously 
READS in one cycte. 

-Lfse 2 tx/ffers to allow lor simu(t«meously 

REAOAVRITE 

•Total mem<xy « 360x2x2 « 1280 bits 
•Min tag width » 126 dots 

80 the fastest that 1 tag can be read • 126/2 • 63 cydes 



Figure 162. Data Flow Through the TDI 

The TD interface consists of the following main sections: 

• the Raw Tag Data Interface - fetches tag data from DRAM; 

• the tag data register; 

• 2 Reed Solomon encoders - each encodes one 4-bit symbol at a time; 

• the Encoded Tag Data Interface - supplies encoded tag data for output; 

• Two 2D decoders. 

The main performance specification for PECl is that the TE must be able to output data at a continuous 
rate of 2 dots per cycle. 
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26.7.4 Raw tag data Interface 

The raw tag data interface (RTDI) provides a simple means of accessing raw tag data in DRAM The RTDI 
passes tag data mto a FIFO where it can be subsequently read as required. The 64-bit output from the 

yl^^'^^'''^^^ being used to set/reset as the enable signal 

{rtdAvail), The FIFO is clocked out with receipt of an rtdRd signal from the TS FSM. 

Figure 163 shows a block diagram of the raw tag data interface. 



DRAM Interface 



raw tag data 
int«fface 



raw tag data 

FIFO 
(8x64>t>fts) 



diu.data(63:0] 
wrptr 

rtd_fffb„wr_en 

rdptr 

pcik 



ndbufl64:0] 



17 



rtd state 
machine 



le.finishedband 



fift>_wr_en 



rtdbufI63:0J 



(2* rtdbuf data legistered in Tag Data Reg) 



PCIK 



o 



fifo_wr^en 
diu.td.rvalid 



I — IZK — 1 

wrptr I I I 

^ L> D o_J 



pclk 



rtd_flfo_wr_©n 



rdptr 
► 



ftdrd 



(rtdrd generated tn TO FSM) 



rtdrd 



rtdavafi 




wr_rd.countar 
— ^ 



Figure 163. Raw tag data interface block diagram 



26.7.4.1 RTDI FSM 



The RTDI state machme is responsible for keeping the raw tag FIFO full. The state machine reads the line 
ot tog data once for each pnntlinc that uses the tag. This means a given line of tag data will be read at least 
126 times since the tag height is 126 lines for 2 mm tags. Note that the first line of tag data may be read 
fewer than 126 times since the start of the page may be within a tag. In addition odd and even rows of tags 
may contain different numbers of tags. 
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Section 26.6.5.1 outlines how to start the TE and restart it between bands. Users must set the NextBand- 
StartTagDataAdr, NextBandEndOJTagData, NextBandFirstTagLineHeight and numTags[0], numTagsf 
registers before starting the TE by asserting Go. 

To restart the tag encoder for second and subsequent bands of a page, the NextBandStartTagDataAdr, 
NextBandEndOJTagData and NextBandFirstTagLineHeigh registers need to be updated (typically 
numTagsfOJ and numTagsf IJ will be the same if the previous band contains an even number of tag rows) 
and NextBandEnable set. See Section 26.6.5.1 for a full description of the four ways of reprogramming 
the TE between bands. 

The tag data is read once for every printline containing tags. When maximally packed, a row of tags con- 
tains 163 tags (see Table 121 on page 364). 

The RTDI State Flow diagram is shown in Figure 164. An explanation of the states follows: 

idle state:- Stay in the idle state if there is no variable data present. If there is variable data present and 
there are at least 4 spaces left in the FIFO then request a burst of 2 tags from the DRAM (1 ♦ 256bits). 
Counter countx is assigned the number of tags in a even/odd line which depends on the value of register 
rtdtagsense, Down-counter county is assigned the number of dot lines high a tag will be (min 126). Ini- 
tially it must be set thoftrsttaglineheight value as the TE may be between pages (i.e. a partial tag). For nor- 
mal tag generation county will take the value of tagmaxiine register. 

diu^access> The diu^access state will generate a request to the DRAM if there are at least 4 spaces in the 
FIFO. This is indicated by the counter wr_rdj:ounter which is incremented/decremented on writes/reads 
of the FIFO. As long as wr_rd_counter is less than 4 (FIFO is 8 high) there must be 4 locations free. A 
control signal called td^diu^radrvalid is generated for the duration of the DRAM burst access. Addresses 
are sent in bursts of 1. The counter burst jcount controls this signal, (will involve modification to existing 



If there is an odd number of tags in line then the last DRAM read will contain a tag in the first 128 bits and 
padding in the final 128 bits. 



fifi>Joad> This state controls the addressing to the DRAM. Counters countx and county are used to moni- 
tor whether the TE is processing a line of dots within a row of tags. When countx is zero it means all tag 
dots for this row are complete. When county is zero it means the TE is on the last line of dots (prior to Y 
scaling) for this row of tags. When a row of tags is complete the sense of rtdtagsense is inverted (odd/ 
even). The rawtagdataadr is compared to the te_endoftagdata address. If ramagdataadr = endoftagdata 
the doneband signal is set, the Jinishedband signal is pulsed, and the FSM enters the rtd^stall state until 
the doneband signal is reset to zero by the PCU by which time the rawtagdata, endoftagedata zndfirstta- 
glineheight registers are setup with new values to restart the TE, This state is used to count the 64-bit reads 
firora the DIU. Each time diujtdjrvalid is high rtd_data_count is incremented by 1. The compare of 
rtd_data_count = rtdjnum is neccessary to find out when either all 4*64-bit data has been received or 
n*64-bit data (depending on a match of rawtagdataadr « endoftagdata in the middle of a set of 4*64-bit 
values being returned by the DIU. 

rtd^stalh- This state waits for the the doneband signal to be reset (see page 379 for a description of how 
this occurs). Once reset the FSM returns to the idle state. This states also performs the same count on the 



TE Code.) 
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diu^data read as above in the case where diu_td_rvaiid has not gone high by the time the addressing is 
complete and the end of band data has been reached i.e. rawtagdataadr » endoftagdata 



variabledataoresBnt = 0 



o: 



1 



IDLE 



He 



3 



qp °° 1 AND wr rd pountflr < If? 



end of 
burst 



OIU^ACCESS 



diu td rack = 1 



FIFO.LOAO 



doneband = 1 



address 
increasing 



Figure 164. RTDI State Flow Diagram 



DRAM addresses 



m 




band NV 1 



cdu_$tartofbandstore 

TE^endoftagdata (for band N) 

TE.endoftagdata (for band N^l) 
cdu.endofbandstore 



Figure 165. Relationship between TE^endoftagdata, cdu^startofbandstore and 

cdu.endofbandstore 
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26.7.5 TDI state machine 

The tag data state machine has two processing phases. The first processing phase is to encode the fixed tag 
data stored in the 128-bit (2 x 64-bit) tag data register. The second is to encode tag data as it is required by 
the tag encoder. 

When the Tag Encoder is started up, the fixed tag data is already preloaded in the 128 bit tag data record. If 
encodeFixed is set, then the 2 codewords stored in the lower bits of the tag data record need to be encoded: 
40 bits if dataRedun = 0, and 56 bits if dataRedun = 1. If encodeFixed is clear, then the lower 120 bits of 
the tag data record must be passed to the encoded tag data interface without bemg encoded. 

When encodeFixed is set, the symbols derived from codeword 0 are written to codeword 6 and the sym- 
bols derived from codeword 1 are written to codeword 7. The data symbols are stored first and then the 
remaining redundancy symbols are stored afterwards, for a total of 15 symbols. Thus, when dataRedun = 
0, the 5 symbols derived from bits 0-19 are written to symbols 0-4, and the redundancy symbols are writ- 
ten to symbols 5-1 4, When dataRedun 1, the 7 symbols derived from bits 0-27 are written to symbols 0- 
6, and the redundancy symbols are written to symbols 7-14. 

When encodeFixed is clear, the 120 bits of fixed data is copied directly to codewords 6 and 7. 
The TDI State Flow diagram is shown in Figure 166. An explanation of the states follows. 




Figure 166. TDi State Flow Diagram 

idle> In the idle state wait for the tag encoder go signal - top^o = 1. The first task is to either store or 
encode the Fixed data. Once the Fixed data is stored or encoded/stored the donefixed flag is set. If there is 
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no variable data the FSM returns to the idle state hence the reason to check the donefixed flag before 
advancing i.e. only store/encode the fixed data once. * 

■SS"'^;;^/^?^'*-'*^ ""^ '^'^'^ *° di'^fy store the fixed data in the 

ETDi or If the fixed data needs to be either (15:5) (40.bits) or (15:7) (56-bits) RS encoded or 2D d«oded 

S^'uld^ " ^codefixed and dataredun and d^2den determine what the next st^te 

bypass_to_adi> nie bypass_to_ctdi takes 120-bits of fixed data(pre-encoded) from the tag_data(127 0) 
register and stores .t in the 15*8 (by 2 for simultaneous reads) buffers. TTie data is pated from ihe 
tag data register through 3 levels of muxing (levell, level2. Ievel3) where it enters the RSO/RSl encoders 
L t r "r * "^f' '^"^^ '^"'^"'-^ «ro hence the data passes 

S'i.?*"" "^""^ ^^^^ erdLwr.arfr must be high to store this data as code- 

etd_bujjiwitch> This state is used to set die tdvalid signal and pulse the etd_adv tag signal which in turn 

h"^*^? "^JTi'rf ""^"^ •The^-.rrimelignal Soused to identify 

the first time a tag is encoded. If zero it means read the tag data from the RTDi FIFO and encode Once 
encoded and stored the FSM returns to this state where it evaluates the sense of tdvalid First time around 
lu'^Jiw ^™ ^ tdvalid aad returns to the readtagdata state to fill the 2nd ETDi buffer. After this 

FSM returns to this state and waits for the lastdotintag signal to arrive. In between tags when the last- 
dotingtag signal is received the etd_adv_tag is pulsed and the FSM goes to the readtagdata state. However 
If the lastdotmtag signal arrives at the end of a line there is an extra 1 cycle delay introduced in generaring 
the etd_adv^tag p^dse (via etd_adv_tag^endofline) due to the pipelining in the TFS. TTiis allows all the 
previous tag to be read from the correct buffer and seamless transfer to the other buffer for the next line. 
readtagdata:. The readtagdata state waits to receive a rtdavail signal from the raw tag data interface which 
mdicates there is raw teg data available. The tag_data register is 128-bits so it takes 2 pulses of the rtdrd 
!S f 2^^'^,"^'° ^« register. If the rtda^>ail signal is set rtdrd is pulsed for 1 cycle 

and the FSM steps onto the loadtagdata state. Initially the Q2igfirst64bits will be zero. The 64-bits of rr^ 
are assigned to 4e tag_data[63:0] and the ^^g first64bits is set to indicate the first raw tag data read is 
V^^v^yi, ?^ f?' ^'^^ '° read.tagdata state where it generates the second rtdrd pulse. 

^g_damf%76^^ loadtagdata state for where the second 64-bits of rawtag data are assigned to 

loadtagdata:. The loadtagdata state writes the raw tag data into thc.tag_data register from the RTDi FIFO 
Ths first64bns flag is reset to zero as the tag_data register now contains 120/1 12 bits of variable data A 
decode ofwhether to (15:5) or(15:7) RS encode or 2D decode this data, decides the next state. 

Z-^-^zf '"^^'' "l-^M Solomon (15:5) mode) state either encodes 40-bit Fixed data or 120-bit 

Vanable data and provides the encoded tag data write address and write enable (eld_>vr adr and etd^ve 
respective^- Once the fixed tag data is encoded the donefixed Rzg is set as this only needs To be done once 
per page The vanabledatapresent register is then polled to see if there is variable data in the tags If there 

ni'^u I. I^'T'*'*' '^'^^ '^^'^ '^TDi and loaded into the fag rfara register 

Else tfie tdvahd flag must be set and FSM returns to the idle state. control_5 is a control bit for the RS 
Encoder and controls feedforward and feedback muxes that enable (15:5) encoding. 

u-^^ ^1^° generates the control signals for passing 120-bits of variable tag data to the RS 
encoder m 4-bit symbols per clock cycle. rs_counter is used both to control the levell.mux and act as the 
15-cycle counter of the RS Encoder. This logic cycles for a total of 3*15 cycles to encode the 120-bits. 

bits fnsteadl)f 5"^ ^"^ " *° '^'^^^ levell_mux has to select 7 4-bit sym- 

'\'^^w7^'^-^t7\^""''^Z^'^-^^-'^ ' ^'^^^-^^ states provides the control signals for passing the 
120-bit vanable data to the 2D decoder. TTie 2 Isbs are decoded to create 4 bits! Ihe 4 bits ftot^ eacJ 
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decoder are combined and stored in the ETDi. Next the 2 MSBs are decoded to create 4 bits. Again the 4 
bits from each decoder are combined and stored in the ETDi. 

As can be seen from Figure 161 on page 386 there are 3 stages of muxing between the Tag Data register 
and the RS encoders or 2D decoders. Levels 1-2 are controlled by ievelJ^mux and !evel2_mux which are 
generated within the TDi FSM as is the write address to the ETDi buffers ietd_wr_jidr) 

Figures 1 67 through 172 illustrate the mappings used to store the encoded fixed and variable tae data in the 
ETDI buffers. * 
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halftagOneWl 
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..10 
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9S94.. 
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127 laft. 


..9796 



TE,tagdata(1 19:0) TE^tagda«(l 19:0) 



curr_wriiojBdr4-i^rr.«Bad.adr» 1 

IJ ► 





r TE.tagdata(i27:0) 


19 18.. 


.. 1 0 


63 62. . 


..10 




63 62.. 


..10 




3936.. 


..21 2C 


127126.. 


..6564 




127 126. . 


..6564 




5958.. 


.. 4140 


• 










7978.. 


.616C 


• 








9996.. 


..6t6C 










119116. .101 IOC 



d4 ds d2 da do 



TE_ta9data{l19:0) 




Pa P8 P7 P6 Ps P4 Pa P2 Pi Po ^4 da d(2 di do 



dO to d9 are encoded and stored 
during cydes N to 



Pi9Pi8Pi7pifl PisPi4 Pi3Pi2 Pn Ptod9dad7ded5 

' wradr(5:0) 



di4di3di2dvid|o 



di9dt|di7diedi5 



^^ ^flSO |P29 P2e P27 P26P2S P24 P23 P22 P21 Pzpd^ 



i4di3d^2d|i dto 



P39 P3a P37 P3e P35 P34 P33 P32 P31 P30 di9 di3 d,7 d,6 d,s 



dIO to d19 are encoded and stored 
during cycTes N<f 15 to N'»>29 




codewords 

codeword 2 ■ 



I1E 


P39P29 


10 


P3«P2B 


1C 


P37 Par 


IB 


P36 P26 


1A 


P35 P2S 


19 


P34 P24 


18 


P33 P23 


17 


P32 P22 


16 


P31 Pai 


15 


P30P20 


14 


di9di4 


13 


diedn 


12 


di7di2 


11 


disdn 


10 


disdto 




^ t 




codeword 1 • 
codeword 0 - 



wradr(5:0) 



d20 to d29 are encoded and stored 
during cydes N^30 to N+44 



P49 P46 P47 P46 Pa^ P44 P43 P42 P41 P40 **24 d23 d22 d2i d2o 



RSI — »j P59 Psa Ps7 Pss PS3 P54 Ps3 Ps2 Psi P50 dae dgy d26 dgs 



Figure 167. Mapping of the tag data to codewords 0-7 
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P57 P47 
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P56P46 
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Pm P45 
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P54P44 


A ^® 


P53P43 


J\ 27 


PS2P42 
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P5I P41 


\\ 


PSO P40 


\\ 


d29d24 


\\ ^ 


d2e«*23 




d27 d22 


^ 21 


dj^dg, 


^ 20 


<^2S d20 


codewords ^ T 

codeword 4 ' 
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wradr(5:0) 



TE.tagdatB(119:0] 

d4 d2 di dp 
dg da d7 de ds 



>di d( 



Pt9PiaPi7pT6Pts Pi4 Pi3 Pi2 Pn Pio d9 dg d7 de dg 



do to d9 are encoded and stored 
during cydes N to N-i-14 




codeword 7- 
codeworde ' 



Figure 168. Coding and mapping of uncoded Fixed Tag Data for (15,5) RS encoder 



TE_tagdata(1 19:0) 



d4 dj <S2 d^ dp 
dg d^ dfi ds 



di4 dia dii d|a 



di9di5di7 d^gdis 



d24_^_d22^1_d20 

d29 d2a d27 d26 <hs 



do to d29 are stored 
during cydes N to N^14 



wradr(5:0) 



3E 


d29d24 


3D 


d2ad23 


3C 


d27d22 


3B 


djedat 


3A 




39 


di9d,4 


38 


diedia 


37 


di7dt2 


36 


dia dn 


35 


dtsdio 


34 


d9d4 


33 


da da 


32 


djda 


31 


dedi 


30 


dsdo 


rd6 ' 



Figure 169. Mapping of pre-coded Fixed Tag Data 
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haMtagnneOn 





ORAM 






31 29.. 


..10 




S3 62.. 


..3332 




0594.. 


..65 64 


(di.dlii.adir«3 


127 126.. 


.97 96 



TE.toedata(1 1 TE.tag<Jata(1 1 1 :0) 



f ajfr.writd_adrteufT,read,atfr 



ajrr^wrtie jdr* I4eurr_fieaduadr7r 

I J ► 



6362.. 


-.10 


127 125. . 


-.65 64 



T^taodata(l27:0) 



6362.. 


..1 0 


t27 126.. 


..65 64 



27 26 .. ^10 




c% ds d4 d2 do 


55 54.. ..29 21 






63 62. ..97 56 






111 ItO.. ..856< 







wmdr(5:0} 



TE_tagdata(l11:0) 



d^ ds d4 d3 d2 d2 do 



^13 ^12 <*nt^io^ <^a <h 




P7 Pa Ps P4 Pa j>2 Pi Pp ite <<s ^4 d2 d, do 



»C Pi5PmPi3Pi2Pii P10P9 P8tfi3tfi2«iid|od9'd^ 



dO to dl 3 are encoded and stored 
durtng cycles N to N^¥^A 



TE.taodata( 111:0) 




P23 P22 P21 P20 P19 Pia Pi 7 Pis <^20 ^19 die «i7 d^e <1,5 d,, 



P31 P30 P29 Pee P27 P2e P25 P24 <i27 CI26 dgs 624 d^j dji 



dl4 to d27 are encoded and stored 
during cydes N-*- 15 to Ni-29 



codeword 3 

codeword 2 




Figure 170. Coding and mapping of Variable Tag Data for (15.7) RS encoder 



Doc; SoPEC.hafdware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 397 



SoPEC : Hardware Design 



TE_tagdata(1 1 1:0) 



dfids d4 d3 d2 d2 do 



dta d^g d,! diodg da dj 



RSO | — 



Pl5Pl4Pt3Pl2Pl1 P10P9 Pa ^13^12^11 <JlO <<9 dfi 



dO to d13 are encoded and stored 
during cyctes N to N>14 




codeword 7 
codeword 



Figure 171. Coding and mapping of uncoded Fixed Tag Data for (15,7) RS encoder 
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haHtagllnaO/l 
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TE_tagdata(119:0) 



wradr(5:0) 
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' \ after 2D decoding (4-bit& long) 

> Ik after 20 decoding (4^its long) 



2D Decoding 
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codeword 3 
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Figure 172, Mapping of 2D decoded Variable Tag Data 
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26.7.6 Reed Solomon (RS) Encoder 



26.7.7 Introductron 



A Reed Solomon code is a non binaiy, block code. If a symbol consists of m bits then there are q = pos- 
sible symbols defimng the code alphabet. In the TE, m = 4 so the number of possible symbols is q - 16. 
An (n,k) RS code is a block code with k information symbols and n code-word symbols. RS codes have 
the property that the code word n is limited to at most q+l symbols in length. 

In the case of the TE, both (15.5) and (15 J) RS codes can be used. This means that up to 5 and 4 symbols 
respectively can be corrected. 

Only one type of RS coder is used at any particular time. The RS coder to be used is determined by the 
registers TEJLataredun and TE_decode2den\ 

• TE^dataredun = 0 and TEjdecode2den - 0» then use the (1 5,5) RS coder 

• TE_dataredun ^ 1 and TE^decodelden «= 0, then use the (1 5,7) RS coder 

For a (15,k) RS code with m = 4. k 4-bit information symbols applied to the coder produce 1 5 4-bit code- 
word symbols at the output. In the TE, the code is systematic so the first k codeword symbols are the same 
the as the k input information symbols. 

A simple block diagram can be seen in. 



1 2 lt.| li 



RS (n.k) encoder 
symbol size m=4 



12 n-1 n 

[ : : : nnEEnns] 



Figure 173. Simple block diagram for an m=4 Reed Solomon Encoder 

26.7.8 I/O Specification 

A I/O diagram of the RS encoder can be seen in. 



re_data_^rtj3;0) 




Figure 174. RS Encoder I/O diagram 

26.7.9 Proposed implementation 

In the case of the TE. (15.5) and (15,7) codes are to be used with 4-bits per symbol. 

The primitive polynomial is p(x) - + x + 1 

In the case of the (1 5.5) code, this gives a generator polynomial of 

g(x) « (x+a)(x+a2)(x+a3Xx+a^)(x+a5)(x+a6)(x+a'7Xx+a8)(x+a^)(x+a^^ 
g(x) x^o + a^x^ + a^x^ + a^^ + a«x^ + a^^x^ + a^x^ + ax^ + aV + ax + a'^ 

gW + g9^^ + gfiX^ + g7X^ + g5X^ + gjX^ + g4X^ + g3x3 + g^X^ + gjX + go 
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In the case of the (1 5,7) code, this gives a generator polynomial of 

h(x) = (x+a)(x+a2)(x+a'Xx+a''Xx+a5)(x+a«)(x+a')(x+a8) 

h(x) = x» + a^x' + a^x* + a^x^ + aV + a^x^ + a^x^ + a"x + a« 

h(x) = X» + hyx' + h<5X« + hjX* + h4x'* + hjx' + biX^ + hjX + ho 

The output code words are produced by dividing the generator polynomial into a polynomial made up 
irom tne input symbols. 

This division is accomplished using the circuit shown in Figure 175. 



contfol_7 




EST' 



004 



07 



oontroLS 




(9)deno(as an multipJIftr that 
muftipKos Galois naid elements 

^denotfta an adder that 
adds Galots Field elements 



(de.d,.)d«.ci3.d2.d|.<ib . 



control_7 — X\ 
cootrol_t— J h 

TE_dataradun 
-,nnix3 



rs_datajn(3:0) 




codeword 



ra_daia_oiJt(3:0) 



Figure 175. (15,5) & (15.7) RS Encoder block diagram 

The data in the circuit are Galois Field elements so addition and multiplication are performed using special 
circuitry. These are explained in the next sections. 

The RS coder can operate either in (15,5) or (15.7) mode. The selection is made by the registers 
TE^dataredun and TEJtecodelden, 

When operating in (15,5) mode controlj? is always zero and when operating in (15,7) mode control 5 is 
always zero. v » / ^^^-^ t» 

Firstly consider (15,5) mode i.e. TEJtataredun is set to zero. 
For each new set of 5 input symbols, processing is as follows: 

The 4-bits of the first symbol are fed to the input port rsJtataJni^ S^i) and control^S is set to 0. mux2 is 
set so as to use the output as feedback, control J is zero so fnux4 selects the input {rsjtatajn) as the out- 
put (rs data-out) Once the data has settled (« 1 cycle), the shift registers are clocked. The next symbol 
d^ IS then apphed to the input, and again after the data has settled the shift registers are clocked again. This 
is repeated for the next 3 symbols dj. d^ and d,. As a result, the first 5 outputs are the same as the inputs 
After 5 cycles the shift registers now contain the next 10 required outputs. controU is set to 1 for the next 
10 cycles so that zeros are fed back by mux2 and the shift register values are fed to the output by mux3 
and mux4 by simply clocking the registers. 
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A timing diagram is shown below. 




I i 
Figure 176. (15,5) RS Encoder timing diagram 

Secondly consider (i 5,7) mode i.e. TEJUitaredun is set to one. 

In this case processing is similar to above except that control J7 stays low while 7 symbols (do, dj ... d^ 
are fed in. As well as being fed back into the circuit, these symbols are fed to the output. After these 7 
cycles. control_7 is set to 1 and the contents of the shift registers are fed to the output. 
A timing diagram is shown below. 
II) I 



dk 

ra.data_frif3:0] 
r8_«iaia_out[3:01 
rs^counter 
TE_dataredun 
controLS 
control.? 




Figure 177. {15 J) RS Encoder timing diagram 

The enable signal can be used to start/reset the counter and the shift registers. 

The RS encoders can be designed so that encoding starts on a rising enable edge. After 15 symbols have 
been output, the encoder stops until a rising enable edge is detected. As a result there will be a delay 
between each codeword. 

Alternatively, once the enable goes high the shift registers are reset and encoding will proceed until it is 
told to stop. rs_datajn must be supplied at the correct time. Using this method, data can be continuously 
output at a rate of 1 symbol per cycle, even over a few codewords. 

Alternatively, the RS encoder can request data as it requires. 

The performance criterion that must be met is that the following must be carried out within 63 cycles 

• load one tag's raw data into TE_tagdata 

• encode the raw tag data 

• store the encoded tag data in the Encoded Tag Data Interface 

In the case of the raw fixed tag data at the start of a page, there is no definite performance criterion except 
that it should be encoded and stored as fast as possible. 
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26.7.10 Galois Field elements and their representation 

A Galois Field is a set of elements in which we can do addition, subtraction, multiplication and division 
without leavmg the set 

The TE uses RS encoding over the Galois Field GF(2^). There arc 2^ elements in GF(2*) and they are gen- 
erated using the primitive polynomial p(x) = + x + 1 . ^ & 

The 16 elements of GF(2^) can be represented in a number of different ways. Table shows three possible 
representations - the power, polynomial and 4-tuple representation. 



Tabre 129. GF(2*) representaUons 













0 


0 


(00 0 0) 


1 


1 


(1000) 


a 


X 


(0 10 0) 


oz 




(00 10) 




X^ 


(0 0 0 1) 


a* 


1 +x 


(110 0) 


a5 




(0 110) 


a« 


x^x^ 


(0 011) 




1+X +x^ 


(110 1) 


a« 


1 +X2 


(10 10) 


a9 


X +X5 


(0 10 1) 




1 + X + X2 


(1110) 




x+x^+x^ 


(0 111) 




l+X + X^ + X^ 


(1111) 




1 +X2 + X' 


(10 11) 


a" 


1 +X3 


(10 01) 



26.7«1 1 Multiplication of GF(2^) elements 

The multiplication of two field elements a* and a** is defined as 
(X^ = a^.a** = Qj(a+b)modiilo 15 

Thus 



So if we have the elements in exponential form, multiplication is simply a matter of modulo 1 5 addition. 
If the elements are in polynomial/tuple form, the polynomials must be multiplied and reduced mod x^ + x 



Suppose we wish to multiply the two field elements in GF(2'*): 
a* = a3X^ + ajx^ + ajx^ + a^ 
a** - bjx^ + bjx^ + bjx* + bo 
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where are in the field (0,1) (i.e. modulo 2 arithmetic) 

Multiplying these out and using + x + 1 = 0 we get: 

a^"*"^ = [(ao^ + aibj + ajb, -f aabo) + ajbjjx^ 

+ [(aob2 + a,bj + ajbo) + a3b3 + (a3b2 + a2b3)]x^ 

+ [(aobi + ajbo) + (ajbj + ajbj) + (ajbj + a2b2 + a3b,)]x 

+ [(ao^O aib3 -H a2b2 + a3b,)] 



a*"^ {aol>3 + aib2 + ajb, + ajCbo + b3)]x^ 

+ [aoba + aib, + ^zOo + bj) + 2^^Q>2 + 1^^ 

+ [aob, + a,(bo + b3) + a2(b2 + bj) + ajOJj + bj) ]x 

+ [aobo + ajb3 + a2b2 + asbj 



If we wish to multiply an arbitrary field element by a fixed field element we get a more simple form. Sup- 
pose we wish to multiply a** by a^. 

In this case = x^ so (aO al a2 a3) = (0 0 0 1). Substituting this into the above equation gives 

ct*' * (bo + b3)x^ + (b2 + b3)x^ + (bj + b2)x + b, 
This can be implemented using simple XOR gates as shown in Figure 178 

t)2 bt bo 



«9cclusiws OR gate 



Co 



Figure 178. Circuit for multiprying by 

26.7.12 AddHion of GF(2^) elements 

If the elements are in their polynomial/tuple form, polynomials are singly added. 
Suppose we wish to add the two field elements in GF(2'^): 



a 



a3X'* + a2X^ + aiX + ao 



= h^x^ + bjx^ 4- b ix + bo 
where a^, bj are in the field (0,1) (i.e. modulo 2 arithmetic) 

= a* + - (a3 + b3)x5 + (a2 + b2)x2 + (a , + bi)x + (ao + bo) 
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Again this can be implemented using simple XOR gates as shown in Figure 1 79 

as t>a aa t>, a, ao 



^ 



exdualve OR gate 



-a^4>a' 



Figure 179. Adding two field elements 

26.7.13 Reed Solomon Implementation 

The designer can decide to create the relevant addition and multiplication circuits and instantiate them 
where necessary. Alternatively the feedback multiplications can be combined ^ 
Consider the multiplication 

or in terms of polynomials 

(ajx^ + ajx^ + a,x + !^)Q>iX^ + bjx^ + b,x + bo) = (cjx^ + cjx^ + c,x + Co) 

, issible field elements in for a* and 
results shown in Table 130. 



If we substitute aU of the possible field elements in for and express in temis of a^, we get the table of 



Table 130. a<^mumplied by all field elements, expressed in temis of a** 




,13 



(0000) 



(1 0 0 0) 



(0 10 0) 



(0 0 10) 



(000 1) 



(110 0) 



(0 110) 



(0 0 11) 



(110 1) 



(10 10) 



(010 1) 



(1110) 



(0 111) 



(1111) 



(10 11) 
(1 001) 



bo 
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bi+b2 



bo^i^3 



bo+ba 



bi+ba 



bo-»-b2+b3 



bi+b2+t>3 



bQ-»-b| 



bo+bj+bj 



bo+b, 



bo+b3 



b2+b3 



bo+b,+b3 



bo+bj 



b,+b3 



bo+b2+b3 



bo+b,+b2+b3 



bo+bi-fb2 



bo+bi 



the following signals are required: 
• bo, bi,b2,b3. 
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b2+b3 



b,+b2 
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bi+b. 



bo-»4>2+b3 



bQ+b^+b2+b3 



bo+t>,+b2 



r 

bo+bi 



bo 



bo+b3 



bi+b2 



bo+bi'^b3 



bi-Kbn 



bo+b2+b3 



bi+bg+bg 



bo+bi-i'b2-i-b3 



bo+bi-^b2 



bo+bi 



bo 
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• ( bo+b,), (bo+b2), (bo+bj). (bi+bj). (bj+ba). (bj+bj), 

• (bo+bi+bj), (bo+b,+b3), (bo+bj+bj), (bj+bi+bj), 

• (bo+bi^-bj+ba) 

The implementation of the circuit can be seen in Figure . The main components are XOR gates 4-bit shift 
registers and multiplexers. 

The RS encoder has 4 input lines labelled 0J,2 & 3 and 4 output lines labelled 0,1.2 & 3. This labelling 
corresponds to the subscripts of the polynomial/4.tuple representation. The mapping of 4-bit symbols 
from the TE_tagdata register into the RS is as follows: 

- the LSB in the TE_tagdata is fed into lincO 

- the next most significant LSB is fed into linel 

- the next most significant LSB is fed into line2 

- the MSB is fed into line3 

The RS output mapping to the Encoded tag data interface is similiar. Two encoded symbols are stored in 
an 8-bit address. Within these 8 bits: 

- lineO is fed into the LSB (bit 0/4) 

- linel is fed into the next most significant LSB (bit 1/5) 

- Iine2 is fed into the next most significant LSB (bit 2/6) 

- Iine3 is fed into the MSB (bit 3/7) 



Doc: SoPEC_hardware_clesign 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 406 



SoPEC : Hardware Design 




ho(aP) h,(a^^)h2(a5) ha (a^^) h4 (o^) h5(a^) hgCa^) h7(a^^) 



bi-b2 



*4 



bo 



4t 



t>z^ 
b, 



bd*b5 
bo^bi ♦bj 
bfbj 
bj^bj 



rn4 



bi^bs 
bi 



bo*b, 
bj 
ba 




controLS 



go(a^°) 9i{a) 92(06)93(0) 94 (a2) 95 (a^*) gg (a^) 9? (a^) 98 (a^) 99 («^) 




+ exciustv« OR oaia 
4-1^) shift rofitoter 



r8_0ai£C'n(3:0) I _i_ 



a 



Figure 180. RS Encoder Implementation 
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26.7.14 2D Decoder 



The 2D decoder is selected when TE_decode2den = i. It operates on variable tag data only, its function is 
to convert 2-bits into 4-bits according to Table 131. 



Table 131. Operation of 2D decoder 







00 


000 1 


01 


00 1 0 


1 0 


01 OO'^ 


1 1 


1 000 



26.7.1 5 Encoded tag data interface 



The encoded tag data interface contains an encoded fixed tag data store interface and an encoded variable 
tag data store interface, as shown in Figure 181. 



datain 



rdAdrO 




advTag 



etdl 



etdO 



Figure 181. encoded tag data interface 
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The two reord units simply reorder the 9 input bits to map low-order codewords into the bit selection com- 
ponent of the address as shown in Table 132. Reordering of write addresses is not necessary since the 
addresses are already in the correct format. 



Table 132. Reord unit 




The encoded iixed data interface is a single 15 x 8-bit RAM with 2 read ports and 1 write port. As it is only 
written to during page setup time (it is fixed for the duration of a page) there is no need for simultaneous 
read/write access. However the fixed data store must be capable of decoding two simultaneous reads in a 
single cycle.Figure 1 82 shows the implementation of the fixed data store. 



fdAdrO 



wrAdr I 



data In 



8 



3ito bits) 



4 



encoded fixed tag data Interface 



4 {W bits) 



latain ^ 



(ISxSbft) 



(15x8 bit) 



3 (fo facts) 



4 



outo 



-> outi 



k — — — — — — — — — — — — — — — — — — — — — — ^..J 

Figure 182. encoded fixed tag data interface 

The encoded variable tag data interface is a double buffered 3 x 1 5 x 8-bit RAM with 2 read ports and 1 
write port. The double buffering allows one tag's data to be read (two reads in a single cycle) while the 
next tag's variable data is being stored. Write addressing is 6 bits: 2 bits of address for selecting 1 of 3, and 
4 bits of address for selecting 1 of 15. Read addressing is the same with the addition of 3 more address bits 
for selecting 1 of 8. 

Figure 1 83 shows the implementation of the encoded variable tag data store. Double buffering is imple- 
mented via two sub-buffers. Each time an AdvTag pulse is received, the sense of which sub-buffer is being 
read from or written to changes. This is accomplished by a 1-bit flag called wrsbO. Although the initial 
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state of y^rsbO is irrelevant, it must invert upon receipt of an AtbfTag pulse. The structure of each sub-buffcr 
is shown in Figure 184. 



nlAdrO- 
rdAdri- 
wrAdr- 
datatn - 

evttfwo- 



wrsbO 
(1 bit) 
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L!=£> 



9ncodBd variabi* tag data Interfaco 



variable 
tag data 
sub buffer 0 



variable 
tag data 
$ub buffer 1 



1 

1/ 



1 



1 



outO 



■> outi 



Figure 183. Encoded variable tag data interface 
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Figure 184. Encoded variable tag data sub-buffer 
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J3 



26.8 Tag Format Structure (TFS) Interface 



26.8.1 Introduction 



The TFS specifies the contents of every dot position within a tags border i.e.: 

• is the dot part of the background? 

• is the dot part of the data? 

The TFS is broken up into Tag Line Structures (TLS) which specify the contents of every dot position in a 
particular line of a tag. Each TLS consists of three tables - A, B and C (see Figure 1 85). 

For a given line of dots, all the tags on that line correspond to the same tag line structure. Consequently, for 
a given line of output dots, a single tag line structure is required, and not the entire TFS. Double buffering 
allows the next tag line structure to be fetched from the TFS in DRAM while the existing tag line structure 
is used to render the current tag line. 

The TFS interface is responsible for loading the appropriate line of the tag format structure as the tag 
encoder advances through the page. It is also responsible for producing table A and table B outputs for two 
consecutive dot positions in the current tag line. 

0 31 



TE.tfsstartadr I 



Tag Format Structure 
for tag X 



The number of dot Unes 

In a Tag = m-l 

I.e. TagHeight9 rw-i 



Te.tfsondadr 



TLSX_0 



TLSX_1 



TLSX_2 



TLSX_n 



TLS X+1 _0 



TLSX+1_1 



TLS X+1 2 



TLS X+1_n 



Table A 

24 X 32-bits»766-bits 
(384 entries x 2-bits} 



Table B 

9 X a2-bltss288>blts 
f32 entries x 943its\ 



g'lo- ^- — 



23 
24 

32 
33 



•31 



Table C 
10-bits 

(2 entries x S-blts) 



22-bils reserved and unused 



Figure 185. Breakdown of the Tag Format Structure 

There is a TLS for every dot line of a tag. 

All tags that are on the same line have the exact same TLS. 

A tag can be up to 384 dots wide, so each of these 384 dots must be specified in the TLS. 

The TLS information is stored in DRAM and one TLS must be read m to the TFS Interface for each 

hne of dots that are outputted to the Tag Plane Line Buffers. 

Each TLS consists of 17 64-bits words. This is read from DRAM as 5 times 256-bii words with 192 
padded bits in the last 256-bit DRAM read 



26.8.2 I/O Specification 

Table 133. Tag Format Structure Interface Port Ust 





m 




pclk 


In 


SoPEC system clock 




In 


Active-low. synchronous reset in pclk domain 
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Table 133. Tag Format Structure Intertace Port List 











1 


1 tao stgnai from TE top level 


DRAM 




diu_data[63:0] 


In 


Data from DRAM 


diu_tfs_rack 


In 


Data acknowledge from DRAM 


diu_tfs_fvalid 


In 


Data valid from DRAM 


tfs_diu_rreq 


Out 


Read request to DRAM 


lfs_diu.fadrI21:51 


Out 


Read address to DRAM 


tag encoder top level 




top_advtagline 


(n 


Pulsed after the last line of a row of tags 


top^tagaltsense 


In 


For even tag rows = 0 i.e. 0^.4.. 
Por odd tag rows = 1 i.e. 1 .3,5... 


topjastdotjntag 


In 


Last dot in tag Is currently being processed 


top.dotposvalid 


In 


Current dot position is a tag dot and its structure data and tag data Is 
available 


top_tagdotnum[7:0] 


In 


Counts from zero up to TB^tagmaxdotpairs (min. =1, max. ^ 192) 


ttsLvaJld 


Out 


TLS tables A, B and C. ready for use 


tfsLla_dotO[T:0] 


Out 


Even entry from Table A corresponding to top tagdotnum 


tfsLta_dot1(1:0] 


Out 


Odd entry from Table A corresponding to top.tagdotnum 


tag encoder top level (PCU read decoder) ~ " 


tfs_te_tfsstartad t[23 ;0J 


Out 


TPS tfsstartadr register 


t*s_to_tfsendadr[23:0] 


Out 


TPS tfsendadr register 


tfs_te_tfsfirstlineadr(23:0] 


Out 


TPS tisfirstlineadr register 


tfs_te_cumfsadr(23:0] 


Out 


TPS currtfsadr register 


TDI ' 


tfsLtdi.adfOt8:01 | 


Out 


Read address for dotO (even dot) 


tfsLtdLadr1[B:0] | 


Out 


Read address for doti (odd dot) 
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26.8.2.1 State machine 

The state machine is responsible for generating control signals for the various TFS table units, and to load 
the appropriate line from the TFS. The states are explained below. 

idle:- Wait for top^go to become active. Pulse advjfsjine for 1 cycle to reset tawadr and tbwradr regis- 
ters. Pulsing advjfsjine will switch the read/write sense of Table B so switching Table A here as well to 
keep things the same i.e. wrtaO = l^OT(wrtaO). 

diu_jzccess> In the diu^access state a request is sent to the DIU. Once an ack signal is received Table A 
write enable is asserted and the FSM moves to the tls Joad state. 

tlsjoad:- The DRAM access is a burst of 5 256-bit accesses, ultimately returned by the DIU as 
I 5*(4*64bit) words. There v^rill be 192 padded bits in the last 256-bit DRAM word The first 12 64-bii 

words reads are for Table A, words 1 2 to 1 5 and some of 1 6 are for Table B while part of read 1 6 data is for 
Table C. The counter read_num is used to identify which data goes to which table. The table B data is 
stored temporarily in a 288-bit register until the tis.update stale hence tbwe does not become active until 
read_num= 16). 

• The DIU data goes directly into Table A (12 ♦ 64). 

• The DIU data for Table B is loaded into a 288-bit register. 

• The DIU data goes directly into Table C. 

tls_update> The 288-bits in Table B need to written to a 32*9 buffer. The tls^update state takes care of this 
using the readjium counter. 

tlsjiext.' This state checks the logic level of tfsvalid and switches the read/write senses of Table A {yvrtaO) 
and Table B a cycle later (using the advjfsjine pulse). The reason for switching Table A a cycle early is 
to make sure the topjevel address via tagdotnum is pointing to the correct buffer. Keep in mind the 
topjevel is working a cycle ahead of Table A and 2 cycles ahead of Table B. 

If tfsValid is 1, the state machine waits until the advTagLine signal is received. When it is received, the 
state machine pulses advTFSLine (to switch read/write sense in tables A, B. C), and starts reading the next 
line of the TFS from currTFSAdr, 

U tfsValid is 0, the state machine pulses advTFSLine (to switch read/write sense in tables A, B, C) and then 
jumps to the tls_tfsvalid_set state where the signal tfsValid is set to I (allowing the tag encoder to start, or 
to continue if it had been stalled). The state machine can then stait reading the next line of the TFS from 
currTFSAdr. 

tls_tfsvalid_next:- Simply sets the (/(yvo//V/ signal and returns the FSM to the diu_access state. 



If an advTagLine signal is received before the next line of the TFS has been read in, tfsValid is cleared to 0 
and processing continues as outlined above. 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 413 



SoPEC : Hardware Design 



The TFS state flow diagram is shown in below.. 




diu^access ^ 



diu tfa rarir =^ ^ 



tisjoad ^ 



tfsvalfcf = 1 ANQ 
too advtanltng | 



ifiaiLQunL&UB 



^ tls^update ^ 



read num = <y-[ 



^ tJs^next ^ 



tfs valid == 0 



"^s^tfsvalid.seT^ 



Figure 186. TFSI FSM State Flow Diagram 
26.8.3 Generating a tag from Tables A, B and C 

^l\r^u'''l'^^ "f^ ^""^ ^^'^ ^""^ bounding box. Each entry specifies 

" ''''' ""''"^^ ^^"^"^ ^^'^^^ componenUbot^^^^^^^ 

I^d tLw'^T^^^^^^^ ^^u' ^ "'"'^'^''^ ^^^^^ of the tag in dot-lines 

lidlS of rt".?^''\'t' '^"^ '"^^ ^ " ^^"S^^ ^o the maximum 

wid^ of a tag. The actua^ number of entnes used should match the size of the bounding box for the tag in 
the dot dimension, but aJl 384 entries must be present. v c uijj 

tiSLlinrA?' n ?i ^"^.^^^^^ ^'^'^ to (i« order of appearance) the data dots present in the par- 
ticular hne. Again, all 32 entnes must be present, even if fewer are used. 
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Each ou^ut dot value is generated as follows: Each entry in Table A consists of 2-bits - bitO and bitl 
These 2-bits aie inteipieted accoiding to Table . Table and TVible . ■ 



Table 134. InterpretaUon of bItO from entry In Table A 









0 


the output bit comes directly from bit1 (see Table ). 


1 


me output bit comes from a data bit. Bitl is used in conjunction with Tao Une 
structure Table B to determine which data bit will be outDUt 



Table 135. interpretation of bitl from entry In table A when bItO = 0 



output 0 



output 1 



Table 136. Interp retation of bitl from entry In table A when bItO = 1 

^-tgi " — 



i 



output data bit pointed to by current index into Table B. 



ou»iit date btt pointed to by current index Into Table B. and advance index bv 1. 



If bitO -0 then the output dot for this entry is part of the constant background pattern. The dot value itself 
comes from b,tl ..e. ,f bitl = 0 then the output is 0 and if bitl = 1 then die output is 1. 

If bitO = 1 then the output dot for this entry comes from the variable or fixed tag data Bitl is used in con 
junchon with Tables B and C to detennine data bits to use. 

To underetand the interpretation of bitl when bitO = 1 we need to know what is stored in Table B Table B 
contams the addresses of all the data bits that are used in the particular line of a tag in order of appe^ce 

w be given by the address stored m entry 0 of Table B. As we advance along the various data dots we 
will advance through the various Table B entries. luu* uaia aois we 

Each Table B entry is 9-bits long and each points to a specific variable or fixed data bit for the tag Each 
Ug contains a maxununt of 120 fixed and 360 variable data bits, for a total of 480 d^L blt^ To aid 
itZ^^-^' °" '^^^ T^^'^ interpretat?on?f the 



Table 137. Interpretation of S-bit tag data address In Tabie B 













CodeWordSelect 


Select 1 of 8 codewords. 

Codewords 0. 1 . 2. 3. 4, 5 are variable data. 

Codewords 6. 7 are fixed data. 




mm 
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Table 137. Interpretation of 9-btt tag data address in Table B 















SymborSelect 


Select 1 ons symbols (1111 Invalid) 












BitSelect 


Select 1 of 4 bits from the selected symbols 





data are written to codeword 6 and the symbols derived from fixed data codeword 1 are written to code- 
word 7. The data symbols are stored first and then the remaining redundancy symbols are stored after- 
wards, for a total of 1 5 symbols. Thus, when 5 data symbols arc used, the 5 symbols derived from bits 0-19 
are written to symbols 0-4, and the redundancy symbols are written to symbols 5-14. When 7 data symbols 
are used, the 7 symbols derived from bits 0-27 are written to symbols 0-6, and the redundancy symbols are 
written to symbols 7-14 

However, if the fixed data is supplied to the TE in a pre-encoded form, the encoding could theoretically be 
anything. Consequently the 120 bits of fixed data is copied to codewords 6 and 7 as shown in Table 138. 



1^' ^^^^ ^.^ codeword/symb ols when no redundancy encoding 









0-19 


0-4 


6 


20-39 


0-4 


7 


40-59 


5-9 


6 


60-79 


5-9 


7 


80-99 


10-14 


6 


100-119 


10-14 


7 



It is important to note that the interpretation of bitl from Table A (when bitO = 1) is relative. A 5-bit index 
is used to cycle through the data address in Table B. Since the first tag on a particular line may or may not 
start at the first dot in the tag, an initial value for the index into Table B is needed. Subsequent tags on the 
same line will always start with an index of 0, and any partial tag at the end of a line will simply finish 
before the entire tag has been rendered The initial index required due to the rendering of a partial tag at 
the start of a line is supplied by Table C. The initial index will be different for each TLS and there are two 
possible initial indexes since there are effectively two types of rows of tags in terms of initial offsets. 

Table C provides the appropriate start index into Table B (2 5-bit indices). When rendering even rows of 
tags, entry 0 is used as the initial index into Table B. and when rendering odd rows of tags, entry 1 is used 
as the initial index into Table B. The second and subsequent tags start at the left most dots'position within 
the tag, so can use an initial index of 0. 
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26.8.4 Architecture 

A block diagram of the Tag Format Structure Interface can be seen in Figure 187. 
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Figure 187. TFS Block Diagram 
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26,8.4. 1 Table A interface 



The implementation of table A is two 16 x 64'bit RAMs with a small amount of control logic, as shown in 
Figure 188. While one RAM is read from for the current line's table A data (4 bits representing 2 contigu- 
ous table A entries), the other RAM is being written to with the next line's table A data (64-bits at a time). 



advTFSLIna 



tawg_ 



taRdAdr 



(fatain 



64 



AdrGen 



TablaA 
intarfaca 



<4 



adr 



dataln 



16x64^S 
table A (0) 



adr 



datafn 



we 



16x64-bits 
table A (1) 



ta^doiO_l cyclelater 
ta^dotl^lcydetater 



2jbits 1&0) taEv en 
2ibits 3&2) taOdd 



Figure 188. Table A interface biock diagram 

Note:- The Table A data to be printed (if each LSB = 0) must be passed to the top-level 2 cycles after the 
read of Table A due to the 2-stage pipelining in the TFS from registering Table A and Table B outputs 
hence this extra registering stage for the generation of ta_dotO_l cyclelater and ta_dotl_l cyclelater. 

Each time an AdvTFSLine pulse is received, the sense of which RAM is being read from or written to 
changes. This is accomplished by a 1-bit flag called wrtaO, Although the initial state of wrtaO is irrelevant, 
it must invert upon receipt of an AdvTFSLine pulse. A 4-bit counter called taWrAdr keeps the write 
address for the 12 writes that occur after the start of each line (specified by the AdvTFSLine control input). 
The tawe (table A write enable) input is set whenever the data in is to be written to table A. The talVrAdr 
address counter automatically increments with each write to table A. Address generation for tawe and 
taWrAdr is shown in Table 189. 



advTFSLine - 



wrtaO 
(1 bit) 



table A I 
adctress gen i 



tawo . 



taWrAdr 
(4 bits) 



wrtaO 



taWrAdr 



»•--- ^.^..-.^-.^ 

Figure 189. Table A address generator 
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26.8,4.2 Table C Interface 

A block diagram of the table C interface is shown below in Figure 190. 



tcwe 
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advTFSLine 
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► 



tbRdAdA) 
► 



Figure 190. Table C interface block diagram 

The address generator for table C contains a 5 bit address register adr that is set to a new address at the 
start of processing the tag (either of the two table C initial values based on tagAltSense at the start of the 
Ime, and 0 for subsequent tags on the same line). Each cycle two addresses into table B arc generated 
based on the two 2-bit inputs (inO and inJ), As shown inScction 139. the output address tbRdAdrO is 
always adr and tbRdAdrl is one of adr and adr-^I, and at the end of the cycle adr takes on one of adr 
adr-^l, and adr-\-2. ' 

Tab?e 139. AdrCen lookup table 
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i. X s don*t care state. 



26.6.4,3 Table B interface 



The table B interface implementation generates two encoded tag data addresses (tfsi^adrO^ tfsi^adrl) 
based on two table B input addresses {thRdAdrO, tbRdAdrJ). A block diagram of table B can be seen in 
Figure 191. 



tbR dAdfOt 



tbRdAdrU 
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/ » 
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Figure 191. Table B interface block diagram 

Table B data is initially loaded into the 288-bit table B temporary register via the TFS FSM. Once all 288- 
bit entries have been loaded from DRAM, the data is written in 9-bit chunks to the 32*9 register arrays 
based on tbwradr. 

Each time an AdvTFSLine pulse is received, the sense of which sub buffer is being read from or written to 
changes. This is accomplished by a I -bit flag called wrtbO, Although the initial state ofwrtbO is irrelevant, 
it must inven upon receipt of an AdvTFSLine pulse. 

' Note:- The output addresses from Table B are registered. 
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27 Tag FIFO Unit (TFU) 



27.1 Overview 

The Tag RFO Unit (TFU) provides the means by which data is transfeircd between the Tag Encoder (TE) 
and the HCU. By abstracting the buffering mechanism and controls ftom both units, the interfece is clean 
between the data user and the data generator. 

The TFU is a simple FIFO interface to the HCU. The Tag Encoder will provide support for arbitrary Y 
mteger scalmg up to 1600 dpi. X integer scaling of the tag dot data is performed at the output of the FIFO 
m the TFU. There is feedback to the TE from the TFU to allow stalling of the TE during a line The TE 
interfaces to the TFU with a data width of 8 bits. The TFU interfaces to the HCU with a data width of 1 bit. 
The depth of the TFU FIFO is chosen as 16 bytes so that the HFO can store a single 1 26 dot tag. 

27.1 .1 Interfaces between TE, TFU and HCU 



TE 



t©_tfu_wdata 



to_tfu_wdata -alid 



tfu_te_oktow 
< 



te_thj_wradv ina 



TFU 



s 



FIFO 



hcu_tfi _adv<lot 



1 tfu_hci 


_tdata 


/ 

tfu^bci 


.avail 


— 



HCU 



Figure 192. Interfaces between TE, TFU and HCU 

27.1.1.1 TE-TFU Interface 

The interface from the TE to the TFU comprises the following signals: 

• tejtfa^wdata, 8-bit write data. 

• te_tju_wdatavalid, write data vaJid. 

• tejfu^wradvline, accompanies the last valid 8-bit write data in a line. 
The interface from the TFU to TE comprises the following signal: 

• t/u_te^oktownte, indicating to the TE that there is space available in the TFU FIFO. 

The TE writes data to the TFU FIFO as long as the TFU's tfujte^oktowrite output bit is set. The TE write 
will not occur unless data is accompanied by a data valid signal. 

27.1.1.2 TFU'HCU interface 

The interface from the TFU to the HCU comprises the following signals: 

• tfujkcujtdata, 1 -bit data. 

• tfujicu^avail data valid signal indicating that there is data available in the TFU FIFO. 
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The interface from HCU to TFU comprises the following signal: 
• AcM_(/i/_rea<^>^, indicating to the TFU to supply the next dot. 

27.1 .1.2.1 X scaling 

Tag data is replicated a scale factor (SF) number of times in the X direction to convert the final output to 
1600 dpi. Unlike both the CPU and SFU. which support non-integer scaling, the scaling is integer only. 
Replication in the X direction is performed at the output of the TFU FIFO on a dot-by-dot basis. 

To account for the case where there may be two SoPEC devices, each generating its own portion of a dot- 
line, the first dot in a line may not be replicated the total scale-factor number of times by an individual 
TFU. The dot will xjltimately be scaled-up correctly with both devices doing part of the scaling, one on its 
lead-out and the other on its lead in. 

Note two SoPEC TEs may be involved in producing the same byte of output tag data straddling the print- 
head boundary. The HCU of the left SoPEC will accept from its TE the correct amount of dots» ignoring 
any dots in the last byte that do not apply to its printhead. The TE of the right SoPEC will be programmed 
the correct number of dots into the tag and its output will be byte aligned with the left edge of the print- 
head. 
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27.2 Definitions of I/O 



Table 140. TFU Port List 









Clocks and Resets 


pdk 


1 


In 


SoPEC Functional dock. 


pr8t_n 


1 


In 


GIotMJ reset signal. 


PCU Interface data and control signals 


pcu_addf(3:2J 


2 


In 


PCU address bus. Only 2 bits are required to decode the 
address space for this block. 


pcu_d ataoiit(3 1 : 0] 


32 


In 


Shared write data bus from the PCU. 


tfu j)cu_datain[31 :0J 


32 


Out 


Read data bus from the TFU to the PCU. 


pcu_rwn 


1 


In 


Common read/not*write signal from the PCU. 


pcu_tfu.8el 


1 


In 


Block select from the PCU. When paJLfft/_sfi/is high both 
pcu^addrand pcu^dataoutare valid. " 


tfu_pcu.rdy 


1 


Out 


Ready signal to the PCU. When nu^pcu_rdy is high it indi- 
cates the last cycle of the access. For a write cyde this 
means pci/.dafaouf has been registered by the block and 
for a read cyde this means the data on tfu_pcu^datain is 
vafid. 


TE Interface data and control signals 


te_tfu_wdatar7:0) 


8 


In 


Write data lor TFU FIFO. 


te.tfu.wdatavatid 


1 


In 


Write data valid signal. 


te_tfu_wfadvline 


1 


In 


Advance line signal strobed when the last byte in a line is 
placed on te^tfu^wdata 


tfu_te„okto write 


1 


Out 


Ready signal indicating TFU has space available in iTs FIFO 
and is ready to be written to. 


HCU Ihterfaco data and control signals 


hcu_tfu_advdot 


1 


tn 


Signal indicating to the TFU that the HCU is ready to accept 
the next dot of data from TFU. 


tfu_hcu_tdata 


1 


Out 


Data from the TFU FIFO. 


tfu_hcu_avail 


1 


Out 


Signal Indicating valkl data avaiJable from TFU FIFO. 



27.3 Configuration Registers 



Table 141. TFU Configuration Registers 







M 


m 




Control registers 






0x00 


Reset 


1 


1 


A write to this register causes a reset of 
the SFU. 

This register can be read to indicate the 
reset state: 

0 • reset in progress 

1 - reset not in progress. 
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Table 141. TFU Configuration Registers 





mi 


'mi 

m 




0x04 


Go 


1 


see 
text 


Writing 1 to this register starts the TFU. 
Writing 0 to this register halts the TFU. 
When Go is deasserted the state- 
machines go to their Idle states but all 
counters and configuration re^sters keep 
their values. 

When Go is asserted ail counters are 
reset, but confrguration registers keep 
their values (i.e. they don't get reset). 
The TFU must be started before the TE is 
started. 

This register can be read to determine if 

the TFU is running 

(1 = running, 0 = stopped). 


Setup registers (constant during processing of page) 


0x08 


XScate 


8 


1 


Tag scale factor in X direction . 


OxOC 


XFracScale 


8 


1 


Tag scale factor in X direction for the first 
dot in a tine 


0x10 


THByteCount 


12 


0 


The number of bytes to be accepted from 
the TE per line. Once this number of bytes 
have been received suk>sequent bytes are 
ignored until there is a strobe on the 
te_tfu^wradvline 


0x14 


HCUDotCount 


15 


0 


The number of (optionally) x-scaJed dots 
per nne to be supplied to the HCU. Once 
this number has been reached the remain- 
der of the current FIFO byte is ignored. 



27.4 Detailed description 

The FIFO is a simple 16-byte store with read and write pointers, and a contents store. Figure 193. 16 bytes 

is suflficient to store a single 1 26 dot tag. 

Each line a total of TEByteCount bytes is read into the FIFO. All subsequent bytes are ignored until there 
is a strobe on the tejtfidjwradvline signal, whereupon bytes for the next line are stored. 

On the HCU side, a total of HCUDotCount dots are produced at the output. Once this count is reached any 
more dots in the FIFO byte currently being processed are ignored. For Uie first dot in the next line the start 
of line scale factor, XFracScale^ is used. 
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The behaviour of these signals and the control signals between the TFU and the TE and HCU is detailed 
below. 



FltoWiPtr 



te.tfti.data 



FHb 



' RdBit 



tfOLhcu.tdata 



- FffoRd Ptr 

Figure 193. 16-byte FIFO in TFU 

// Concurrently Executed Code: 

// TE always allowed to write when there's either (a) room or (b) no room and all 
// bytes for that line have been received. 

if <(PifoCntnts la PifoMax) OR (PifoCntnts =s« FifoMax and ByteToRx a= 0)) then 

tfu_te_o)ctowrite » 1 
else 

tfu_te_o)ctowrite = 0 



// Data presented to HCU when there is (a) data in FIFO and (b) the HCU has 

// received all dots for a line 

if (FifoCntnts !» 0) AND (BitToTx != 0)then 

tfu_hcu_avail » 1 
else 

tfu„hcu_avail a 0 



// Output venxx of FIFO data 
tfu_hcu_tdata « Fife [FifoRdPnt] [RdBit] 



// Seczuentially Executed Code: 

if (te_tfu_wdatavalid 1) AND (FifoCntnts != PifoMax) AND (ByteToRx != 0) then 
FifolFifoWrPnt] = te_tfu_wdata 
PifoWrPnt 
FifoContents -^-t- 
ByteToRx — 



if ( te_tfu_wradvline == 1) then 
ByteToRx = TEByteCount 

if (hcu_tfu_advdot == 1 and FifoCntnts != 0) then ( 
BitToTx 

if (RepPrac == 1) then 
RepFrac Xscale 
if (RdBit = 7) then 

RdBit = 0 

FifoRdPnt 

FifoContents -~ 
else 

RdBit-t-i- . 

else 

RepFrac - - 
if (BitToTx *= 1> then ( 

RepFrac = XFracScale • - 

RdBit = 0 

FifoRdPnt 

Pif eContents- - 

BitToTx = HCUDotCount 

) 
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What is not detailed above is the fact that, since this is a circular buffer, both the fifo read and write-point- 
ers wrap-around to zero after they reach two. Also not detailed is the fact that if there is a change of both 
the read and write-pointer in the same cycle, the fifo contents counter remains unchanged. 
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28 Halftoner Compositor Unit (HCU) 

28.1 Overview 

The Halftoner Compositor Unit (HCU) produces dots for each nozzle in the destination printhead taking 
account of the page dimensions (including margins). The spot data and tag data are received in bi-level 
fonn while the pixel contone data received from the CFU must be dithered to a bi-lcvel representation. The 
resultant 6 bi-level planes for each dot position on the page are then remapped to 6 output planes and out- 
put dot at a time (6 bits) to the next stage in the printing pipeline, namely the dead nozzle compensator 
(DNC). 

28.2 Data flow 

Figure 194 shows a single dot data flow high level block diagram of the HCU. The HCU reads contone 
data from the CFU, bi-level spot data from the SFU. and bi-level tag data from the TFU. Dither matrices 
are read from the DRAM via the DIU. The calculated output dot (6 bits) is read by the DNC 



contone FIFO 
unit interface 



ORAM 
Interface unit 



4- 




control 






17 








4^ 


data 





'8 ^^8 y'B 



SB 



± ± 



spot 
FIFO unit 
Interfaca 



tag 
RFO unit 
interface 



3 



Halftoner / Compositor Unit 



deacf 
nozzts 
compensator 



Figure 194, High level block diagram showing the HCU and its external interfaces 

The HCU is given the page dimensions (including margms), and is only started once for the page. It does 
not need to be progranuned in betvireen bands or restarted for each band. The HCU will stall appropriately 
if its input buffers are starved. At the end of the page the HCU will continue to produce 0 for all dots as 
long as data is requested by the units further down the pipeline (this allows later units to conveniently flush 
pipelined data). 

The HCU performs a linear processing of dots calculating the 6-bit output of a dot in each cycle. The map- 
ping of 6 calculated bits to 6 output bits for each dot allows for such example mappings as compositing of 
the spotO layer over the appropriate contone layer (typically black), the merging of CMY into K (if K is 
present in the printhead), the splitting of K into CMY dots if there is no K in the printhead, and the gener- 
ation of a fixative output bitstream. 
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28.3 



DRAM STORAGE REQUIREMENTS 



SoPEC allows for a number of different dither matrix configurations up to 256 bytes wide. The dither 
matrix is stored in DRAM. Using either a single or double-buffer scheme a line of the dither matrix must 
be read in by the HCU over a SoPEC line time. SoPEC must produce 13824 dots per line for A4/Letter 
printing which takes 13824 cycles. 

The following give the storage and bandwidths requirements/or some of the possible configurations of the 
dither matrix. 

• 4 iCbyte DRAM storage required for one 64x64 {^referred) byte dither matrix 

• 6.25 Kbyte DRAM storage required for one 80x80 byte dither matrix 

• 16 Kbyte DRAM storage required for four 64x64 byte dither matrices 

• 64 Kbyte DRAM storage required for one 256x256 byte dither matrix 

Note that regardless of the width of the dither matrix, 256 bytes are always read from DRAM for each line. 
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28.4 Implementation 

I A block diagram of the HCU is given in Rgure 195. 



Contone 
RFOUnil 



Spot 
RFOUnit 



Tag 
FIFO Unit 




Figure 195. Block diagram of the HCU 
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28.4.1 Definition of I/O 

Table 142. HCU port list and description 











Clocks and reset 


pdk 


1 


In 


System dock. 


prst_n 


1 


In 


System reset, synchronous active low. 


PCD interface 


pcu.hcu_sel 


1 


la 


Block select from the PCU. When pc{/_/7Cuusd/ iS h'rgh both 
pcu^adrand pcu_dataout are valid. 


pcu.rwn 


1 


In 


Common read/not-write signal from the PCU. 


pcu_adrr7:2J 


6 


In 


PCU address bus. Only 6 bits are required to decode the 
address space for this block. 


pcu_dataout[31:0] 


32 


In 


Shared write data bus from the PCU. 


hcu_pcu_rdy 


1 


Out 


Ready signal to the PCU. When hcu_paj^rciyls high it indicates 
the last cycle of the access. For a write cycle this means 
pcu^dataout has been registered by the block and for a read 
cycle this means the data on hcu _jx^_data is valid. 


hcu_pcu_data[31 :0) 


32 


Out 


Read data bus to the PCU. 


Dill Interface 


hcij_diu_rreq 


1 


Out 


HCU read request, active high. A read request must be accom- 
panied by a valid read address. 


diu_hcu_rack 


1 


In 


Acknowledge from OIU, active high. Indicates that a read 
request has been accepted and the new read address can be 
placed on the address bus. hcu_dtu_radr. 


hcu_diu_radr(21 :5} 


17 


Out 


HCU read address. 17 bits wide (256-bit aligned word). 


dru_hcu_n/alid 


1 


In 


Read data valid, active high. Indicates that valk) read data is 
now on the read data bus. diu_data. 


diu_data|63:0] 


64 


In 


Read data from DiU. 


CPU Interface 


cfu_hcu_avail 


1 


In 


Indicates valid data present on cfu_hcu_c{3-0]data lines. 


cfu_hcu_cOdataf7:0] 


8 


In 


Pixel of data in contone plane 0. 


cfu_hcu_c 1 data[7 :0] 


8 


In 


Pixel of data in contone plane 1 . 


cfu_hcu_c2datar7:0] 


8 


In 


Pixel of data in contone plane 2. 


cfu_hcu_c3data(7:01 


8 


In 


Pixel of data in contone plane 3. 


hcu_cfu_advdol 


1 


Out 


Informs the CPU that the HCU has captured the pixel data on 
cfu_hcu^c[3-0]data lines and the CPU can now place the next 
pixel on the data lines. 


SFU Interface 


sfu_hcu_avail 


1 


In 


Indicates valid data present on sfu_hcu^$data. 


sfu_hcu_sdata 


1 


In 


Bl-level dot data. 


hcu_sfu_advdoi 


1 


Out 


Informs the SFU that the HCU has captured the dot data on 
sfu_hcu^sdata and the SFU can now place the next dot on the 

data line. 


TFU Interface 


tfu_hcu_avatl 


1 


In 


Indicates valid data present on tfu_^hcu^tdata. 


tfu_hcu_tdata 


1 


.In 


Tag dot data. 
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Table 142. HCU port list and description 





Wliii 




hcujlfu.advclot 


1 


Out 


Informs the TFU that the HCU has captured the dot data on 
tfu_hcu_tdata and the TFU can now place the next dot on the 
data line. 


DNC interfece 


dnc.hcu_ready 


1 


In 


Indicates that DNC is ready to accept data from the HCU. 


hcu_dnc_avail 


1 


Out 


Indcates valid data present on hcu^dncjdata. 


hcu_dnc_data[5:0] 


6 


Out 


Output bi-level dot data In 6 ink planes. 



28.4.2 Configuration Registers 

The configuration registers in the HCU are programmed via the PCU interface. Refer to section 21 .8.2 on 
page 257 for the description of the protocol and timing diagrams for reading and writing registers in the 
HCU. Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register reads 
and writes, the lower 2 bits of the PCU address bus are not required to decode the address space for the 
HCU. When reading a register that is less than 32 bits wide zeros should be returned on the upper imused 
bit(s) of hcu^cu^data. The configuration registers of the HCU are listed in Table 143. 



Table 143. HCU Registers 













Control registers 


0x00 


Reset 


1 


0x1 


A write to this register causes a reset of the HCU. 


0x04 


Go 


1 


0x0 


Writing 1 to this register starts the HCU. Writing 0 to - 
this register halts the HCU. 
When Go is asserted all counters, flags etc. are 
cleared or given their initial value, but configuration 
registers keep their values. 

When Go is deasserted the state-machines go to their 
idle states but all counters and configuration registers 
keep their values. 

The HCU should be started afforthe CFU, SFU, TFU, 
and DNC. 

This register can be read to determine if the HCU is 
running 

(1 = running, 0 = stopped). 


Setup registers (constant for during processing) 


0x10 


AvailMask 


4 


0x0 


Mask used to determine whk^h of the dotgen units etc. 
are to be checked before a dot is generated by the 
HCU within the specified margins for the specified 
color plane. If the specified dotgen unit is stalled, then 
the HCU will also stall. 

See Table 144 for bit alkscatlon and definition. 


0x14 


TMMask 


4 


0x0 


Same as AvailMask. but used in the top margin area 
before the appropriate target page is reached. 


0x1 a 


PageMarginY 


32 


0x0000_ 
0000 


The first line considered to be off the page. 


OxiC 


MaxOot 


16 


0x0000 


This is the maximum dot number • 1 present across a ' 
page. For example if a page contains 13824 dots, 
then A<fajrDof will be 13823. 
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Tabre 143. HCU Registers 





'mm 








0x20 


TopMargin 


32 


OxOOOO_ 
0000 


The first line on a page to be considered within the 
target page for contone and spot data. (0 s first 
printed line of page) 




DUUUII IfVUll ^11 ■ 


32 


0x0000^ 
0000 


The first tine in ftw target bottom cnargin for contone 
and spot data (i.e. first tine after target page). 


/won 


L.81uViaiyin 




0x0000 


Thfk firat dot on a Una within ths taftiat nana for eon- 

1 lie III 31 Uv^ Vl ■ Q Ui iv millMI tll0 ICH|f9l ffW\g*f IVI \M/II 

tone and spot data. 


0x2C 


P* j-i li.tfc..ii ixBiTiirt 

nlynCMargin 


i A 


OxFFFF 


Tho firef Hnt /in et lifiA tAn^ln fhA tAmAt rintit mflrnin fnr 
1 no iirsi Q9i Oil ci wiuiiii uio icuyoi i i^xh iiioiytii tu* 

contone and spot data. 


0x30 


TagTopMargin 


32 


Ox0000_ 
0000 


The first line on a page to be considered within the 
target page for tag data. (0 = first printed line of page) 


0x34 


TagBottomMargin 


32 


0x0000^ 
0000 


The first line in the target bottom margin for tag data 
(i.e. first Dne after target page). 


0x38 


TagLeftMargin 


16 


0x0000 


The first dot on a line within the target page for tag 
data. 


0x3C 


TagRightMargin 


18 


OxFFFF 


The first dot on a line within the target right margin tor 
tag data. 


UX«H/ 




1 


0x0 


1 if a dither matrix is speofied 
0 if a dither matrix is not specified. 


0x44 


StartDMAdr 


17 


0x0^ 
0000 


rants to the first 256-blt word of the first line of the 
dither matrix in DRAM. 


0x48 


EndDMAdr 


17 


0x0_ 
0000 


Points to the last 256-bit word of the last line of the 
dither nrtatrix in DRAM. 


0X4C 


Unelncrement 


5 


0x2 


The numt>er of 256-bit words In DRAM from the start 
of one line of the (£ther matrix and the start ot the next 
line* i.e. the value by which the DRAM address is 
incremented at the start of a line so that it points to the 
start of the next line of the dither matrix. 


0x50 


DMinitlndexCO 


8 


0x00 


Initial index within 256-byte dither matrbc line tMjffer for 
contone plane 0. If using dout>te-buffer scheme, only 
the 7 Isbs are used. 


0x54 


DMLwrlndexCO 


8 


0x00 


Lower index within 256''byte dither matrix line buffer 
for contone plane 0. If using double-buffer scheme, 
only the 7 Isbs are used. 


0x58 


DMUprindexCO 


8 


0X3F 


Upper index within 2564)yte dither matrix line buffer 
for contone plane 0. After reading the data at this 
location the index wraps to DMLwrlndexCO. If using 
dout)ie4>uffer scheme, only the 7 Isbs are used. 


0x5C 


DMrnitlndexCI 


8 


0x00 


Initial index within 256-byte dither matrix line buffer tor 
contone plane 1 . If using double -tniffer scheme, only 
the 7 Isbs are used. 


0x60 


DMLwrindexCI 


8 


0x00 


Lower Index within 256-byte dither matrix line buffer 
tor contone plane 1 . If using double-ttuffer scheme, 
only the 7 Isbs are used. 


0X64 


OMUprlndexCl 


8 


Ox3F 


Upper index within 256-byte dither matrix line buffer 
for contone plane 1. After reading the data at this 
tocatton the index wraps to DMLwrindexCI. If using 
double^Mjffer scheme, only the 7 Isbs are used. 
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Table 143. HCU Registers 









m 


mmmmsmm 


0x68 


DMInitlndexC2 


8 


0x00 


loitiaJ index within 256-byte dither matrix line buffer for 
contone plane 2. If using double^xjffer scheme, only 
the 7 Isbs are used. 


0x6C 


OMLwrtRdexC2 


8 


0x00 


Lower Index within 256-byte dllher matrix One buffer 
for contone plane 2. If using double-buffier scheme, 
only the 7 Isbs are used. 


0x70 


DMUprindexC2 


8 


0x3F 


Upper index within 256-byte dither matrix line buffer 
for contone ptane 2. After reading the data at this 
location the Index wraps to DMLwrfndexC2, If using 
double-buffer scheme, only the 7 Isbs are used. 


0x74 


DMInitlndexC3 


8 


0x00 


Initial index within 256-byte dither matrix tine buffer for 
contone plane 3. If using double-buffer scheme, only 
the 7 Isbs are used. 


0x78 . . 


DMLwrlndexCS 


8 


0x00 


Lower index within 256-byte dither matrix line buffer 
for contone plane 3. If using double-buffer scheme, 
only the 7 lst>s are used. 


0x7C 


DMUprlndexCa 


8 


0x3F 


Upper index within 256-byte dither matrix line buffer 
for contone ptane 3. After reading the data at this 
location the index wraps to OMLwiindexCS, If using 
double-buffer scheme, only the 7 Isbs are used. 


0x80 


DoubleLineBuf 


1 


0x1 


Selects the dither tine buffer nnode to be single or dou- 
ble buffer. 


0x84 to 0x98 


lOMapptngLo 


6x32 


0x0000^ 
0000 


The dot reorg mapping for output inks 0 to 5. For each 
ink's 64-bit lOMapping value, lOMappingLo repre- 
sents the low order 32 bits. 


0x9C to OxBO 


lOMappingHt 


6x32 


0x0000_ 
0000 


The dot reorg mapping for output inks 0 to 5. For each 
the high order 32 bits. 


0xB4 to OxCO 


cpConstant 


4x8 


0x00 


The constant contone value to output for contone 
plane N when printing in the margin areas of the page. 
This value will typically be 0. 


0xC4 


sConstant 


1 


0x0 


The constant bi-(evel value to output tor spot when 
printing in the margin areas of the page. This value 
-wifl typically be 0. 


0xC8 


iConstant 


1 


0x0 


The constant bi-level value to output for tag data when 
printing in the margin areas of the page. This value 
will typically be 0. 


OxCC 


OitherConstant 


8 


OxFF 


The constant value to use for dither matrix when the 
dither matrix is not available. i.e. when the signal 
am_avSLil is 0. This value win typically be OxFF so that 
cpCOnstanf can easily be 0x00 or OxFF without requir- 
ing a dither matrix {OitherConstant is primarily used 
for threshold dithering in the margin areas). 


Debug registers (read onJy) 
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Table 143. HCU Registers 





lih ^v»j^ ^^?^M• ^•ijiu 




OxDO 


HcuPortsDebug 


14 




Bit 1 3 = tfu^hcu^avw'l 
Bit 12 o hcu^tfu^advdot 
Bit 1 1 = sfu_hcu_avaii 
Bit 10 = hcu^sfu^acMiot 
Bit 9 = cfu^hcu^avail 
Bit 8 = hcu^cfu^advdot 
Bit 7 = dnc^hcu^raady 
Bit 6 = hcu_dnc_avail 
Bits 5-0 = hcu_dnc_data 


0xD4 


HcuDotgenDdbug 


IP 


rUn 


Bit 13 = in^tag^target^pags 

Bit 12 « in_,targeLpage 

Bit 11 s p_avait 

Bit 1 0 = s^avaii 

Bit 9 = cp_ayait 

Bit 6 s dm_avail 

Bit 7 s advdot 

Bits 5-0 = [p,s,<^,cp2,cp1,cpOi 

(i.e. 6 bit input to dot reorg untt$) 


0x08 


HcuDitherOebugl 


17 


N/A 


Bit 9 = advdot 

Bit 8 = dm^avail 

Bit 1 5-8 = qp 1^dither_ vai 

Bits 7-0 = cpO_^dither_vai 


OxDC 


Hcu0itherDebug2 


17 


N/A 


Bit 9 = advdot 

Bit 8 a dm^avaii 

Bit 15-8 = cp3jdither_vai 

Bits 7-0 = cp2jdither_yait 



28.4.3 Control unit 

The control unit is responsible for controlling the overall flow of the HCU. It is responsible for determin- 
ing whether or not a dot will be generated in a given cycle, and what dot will actually be generated - 
including whether or not the dot is in a margin area, and what dither cell values should be uiscd at the spe- 
cific dot location. A block diagram of the control unit is shown in Figure 196. 

The inputs to the control unit are a number of avail flags specifying whether or not a given dotgen unit is 
capable of supplying 'real' data in this cycle. The term 'real' refers to data generated from external 
sources, such as contone line buffers, bi-level line buffers, and tag plane buffers. Each dotgen unit informs 
the control unit whether or not a dot can be generated this cycle from real data. It must also check that the 
DNC is ready to receive data. 

The contone/spot margin unit is responsible for determining whether the current dot coordinate is within 
the target contone/spot margins, and the tag margin unit is responsible for determining whether the current 
dot coordinate is within the target tag margins. 

The dither matrix table interface provides the interface to DRAM for the generation of dither cell values 
that are used in the halftoning process in the contone dotgen unit. 
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Qk to readi ok to write 



cp^avan, 8,av9D. tp^avail 



4^ 

3x 



avalLfnask| 



tm.mask 



By 



irutar9etj>ac^ 



ln_tafl_targeijpag© 



determine 
advdot 



rdL^y tet. wr.advdot 



advdot 



position unit 



132 >^18 

? 



contone 
spot 

margin 
unit 



52 



'16 



tag 
margin 
unit 



ill! 



17 



t t 



16. 



dither 
matrix 
table 
interface 



8 / 



> hcu_dnc_ava» 
— dnc_hcu_ready 



advdot 
— max_dot 



^ hcu_dlu_rreq 

^— diu_hcu_rack 
► hcu_diu_radr 



64, 



7^ 



dhj^data 



1 1 i s i' ^ l' f . 1 1 5 5 5 



I I I I 



Figure 196. Blocic diagram of the control unit 



2B.4.3.i Determine AdvDot 



The HCU does not always require contone planes, bi-level or tag planes in order to produce a page. For 
example, a given page may not have a bi-Ievel layer, or a tag layer. In addition, the contone and bi-level 
parts of a page are only required within the contone and bi-level page margins, and the tag part of a page is 
only required within the tag page margins. Thus output dots can be generated without contone, bi-Ieve! or 
tag data before the respective top margins of a page has been reached, and Os are generated for all color 
planes after the end of the page has been reached (to allow later stages of the printing pipeline to flush). 

Consequently the HCU has an AvailMask register that determines which of the various input avail flags 
should be taken notice of during the production of a page from the first line of the target page, and a 
TMMask register that has the same behaviour, but is used in the lines before the target page has been 
reached (i.e. inside the target top margin area). Each bit in the AvailMask refers to a particular avail bit: if 
the bit in the AvailMask register is set, then the corresponding avail bit must be 1 for the HCU to advance 
a dot. The bit to avail correspondence is shown in Table 144. Care should be taken with TMMask - if the 
particular data is not available after the top margin has been reached, then the HCU will stall. Note that the 
ov^zl7 bits for contone and spot colors are ANDed with in_target^ag0 after the target page area has been 
reached to allow dot production in the contone/spot margin areas without needing any data in the CFU and 



Doc: SoPEC_hardware_design 
Version: 2.3 



63 Proprietary Document 



29 Nov 2002 
Page 435 



SoPEC : Hardware Design 



SFU. The avail bit for tag color is ANDed with in_tag_targe(^age after the target tag page area has been 
reached to allow dot production in the tag margin areas without needing any data in the TFU. 



Table 144. Correspondence between bit In AvailMask and avail flag 









0 


dm.avaU 


dither matrix data available 


1 


cp^avall 


contone pixels available 


2 


s_avail 


spot color available 


3 


tp_avail 


tag plane available 



Each of the input avail bits is processed with its appropriate mask bit and the after Jtop^margin flag. The 
output bits are ANDed together along with Go and ok_to_write (which specifies whether the output buffer 
is ready to receive a dot in this cycle) to form the output bit advdot. We also generate wr^advdot. In this 
way, if the output buffer is full or any of the specified avail flags is clear» the HCU vdll stall. When the end 
of the page is reached, in jpage will be deasserted and the HCU will continue to produce 0 for all dots as 
long as the DNC requests data. A block diagram of the determine advdot unit is shown in Figure 197. 

The okjto_read signal from the output buffer indicates that the HCU has a dot available for the DNC to 
read (indicated to the DNC by the assertion of hcujdncjxvail). If the DNC is ready to receive the dot 
(dncjicu^ready is 1) then the dot is read from the output buffer by asserting rdjadvdot. 
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la.tarQ8tj)agd 

tfn_mask(0] 
avalLmasklO] 



after.top.marQin 

tm_masl({1] 
avaiLmasKti] 
lrv.targeCpage 
cp_avall 

8_avaU 



advdot unit 




avB]Lm8sk(2] 

after_tag_top_margln 

tjn_fnasl<3] 

ava!l_masJ<3J 
In_tag_ta rget^pago 
tp^avatl 
ok«to^wriie 
- in_pago 



dnc_hcu_re8dy 



^ advdot 



> wT.advdot 



hcu dnc_avail 
^ rd.advdot 



28.4.3.2 Position unit 



Figure 197. Block diagram of determine advdot unit 



The position unit is responsible for outputting the position of the current dot (curr _pos, currjine) and 
whether or not this dot is the last dot of a line (advline). Both curr _j?os and currjtine are set to 0 at reset or 
when Co transitions from 0 to 1. The position unit relies on the advdot input signal to advance through the 
dots on a page. Whenever an advdot pulse is received, curr jpos gets incremented. If curr _pos equals 
maxjiot then an adviine pulse is generated as this is the last dot in a line, currjine gets incremented, and 
the curr_pos is reset to 0 to start counting the dots for the next line. 



2B.4.3.2 Margin unit 



The responsibility of the margin unit is to determine whether the specific dot coordinate is within the page 
at all, within the target page or in a margin area (see Figure 198). This unit is instantiated for both the con- 
tone/spot maigin unit and the tag margin unit. 
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S5 





target top margin 










margin 




It margin 


<a 






target h 




target i 




target bottom margin 







• target page 

^ prlntat))e page area 
(physical page) 



Figure 198. Page structure 

The margin unit takes the current dot and line position, and returns three flags. 

• the first, in _page is 1 if the current dot is within the page, and 0 if it is outside the page. 

• the second flag, injtarget jyage, is 1 if the dot coordinate is within the target page area of the page, and 
0 if it is within the target top/left/bottom/right margins. 

• the third flag, after_top_maigin, is 1 if the current dot is below the target top margin, and 0 if it is 
within the target top margin. 

A block diagram of the margin unit is shown in Figure 199. 



curr^One 



curr.^s 



top_margin 



l»ttom__mergin- 



page_mafgin_y^ 




rlght^margin 



Jeft^margln 



ln_page ln_taroQtJ>ags aftor^top^margin 

Figure 199. Block diagram of margin unit 



Doc: SoPEC_hardware_design S3 Proprietary Document 29 Nov 2002 

Version: 2.3 ^ Page 438 



m 



SoPEC : Hardware Design 



28.4.3.4 Dither matrix table interface 



The dither matrix table interface provides the interface to DRAM for the generation of dither cell values 
that are used in the halftoning process in the contone dotgen unit The control flag dm_read_enab!e 
enables the reading of the dither matrix table line structure from DRAM. If dm_readjznable is 0. the 
dither matrix is not specified m DRAM and no DRAM accesses are attempted The dilher matrix table 
interface has an output flag dm^avail which specifies if the current line of the specified matrix is available. 
The HCU can be directed to stall when dmjsr^ail is 0 by setting the appropriate bit in the HCU's Avail- 
Maslc or TMMasfc registers. When dm^avail is 0 the value in the DitherConstant register is used as the 
dither cell values that are output to the contone dotgen unit. 

The dither matrix table interface consists of a state machine that interfaces to the DRAM interface, a dither 
matrix buffer that provides dither matrix values, and a unit to generate the addresses for reading the bufifer. 
Figure 200 shows a block diagram of the dither matrix table interface. 



advttns 

advdot 
dmJnit.[nde}ec[0-3] 

dm_twr_tndex_c(0-3 J 

OoubleUnaBuf 




start_dm_a<lr 

Ene^increment 
dm_rea(l_enable 



dither.oonstant 



cpO_dlther_val cp1_dithef_val cp2_dither.va! cp3.dlther_val 

Figure 200. Block diagram of dither matrix table interface 



28.4.3.4.1 Dither matrix buffer 



The state machine loads dither matrix table data a line at a time from DRAM and stores it in a buffer. A 
single line of the dither matrix is either 256 or 128 8-bit entries, depending on the programmable bit Dou- 
bleLineBuf. If this bit is enabled, a double-buffer mechanism, is employed such that.while one buffer is 
read from for the current line's dither matrix data (8 bits representing a single dither matrix entry), the 
other buffer is being written to with the next line's dither matrix data (64-bits at a time). Alternatively, the 
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single bufTer scheme can be used, where the data must be loaded at the end of the line, thus incurring a 
delay. 

The single/double buffer is implemented using a 256 byte S-port register array, two reads, one write port, 
with the reads clocked at double the system clock rate (320MH2) allowing 4 reads per clock cycle. 

The dither matrix buffer unit also provides the mechanism for keeping track of the current read and write 
buffers, and providing the mechanism such that a buffer cannot be read from until it has been written to. In 
this case, each buffer is a line of the dither matrix, i.e. 256 or 128 bytes. 

A bit is kept for the status of each dither matrix line buffer: bujffLavailfOJ and bujBLavailfJJ. It also keeps a 
single bit (rdjbuff) for the current buffer that reads are to occur from, and a single bit (wrjbuff) for the cur- 
rent buffer that writes are to occur to. The output value dm_avail equals huff_avail[rdj>uff]. The output 
value ok_to_write equals bujBLt^'oilfwrJmffJ. Note that when using a single line buffer, buff_avail[l] is 
not used. 

The read addresses are byte aligned A single dither matrix entry is represented by 8 bits and an entry is 
read for each of the four contone planes in parallel. When a advline pulse is received, huff_avail[rdjmff] 
is cleared, and rdjmff is inverted (if using a double line buffer). 

Data is written, 64 bits at a time to the current write buffer when diujicu^rvalid is asserted. When WrAdr 

is 0x1 F and diu_hcu_rvalid is 1, buff_avail[wrjmffj is set, and v/rjbuff 'is inverted (if using a double line 
buffer). This indicates that a line of dither matrix has been written to the current write buffer and it is now 
available to be read. 



For each contone plane there is a initial, lower and upper index to be xised when reading dither cell values 
from the dither matrix double buffer. The read address for each plane is used to select a byte from the cur- 
rent 256-byte read buffer. When Go gets set (0 to 1 transition), or at the end of a line, the read addresses 
are set to their corresponding initial index. Otherwise, the read address generator relies on advdot to 
advance the addresses within the inclusive range specified the lower and upper indices, represented by the 
following pseudocode: 

if (advdot 1) then 

if (advline == 1) then 

rd^adr = dm_init_index 
elsi£ (rd^adr == dnL.^pr_index) then 

rd_adr dni_lwr_index 
else 

rd__adr 



The dither matrix is read from DRAM in single 256-bit accesses, receiving the data from the DIU over 4 
clock cycles (64-bits per cycle).The protocol and timing for read accesses to DRAM is described in sec- 
tion 20.9. 1 on page 208. Read accesses to DRAM are implemented by means of the state machine 
described in Figure 201. 

All counters and flags should be cleared after reset or when Go transitions from 0 to 1. While the Go bit is 
1, the state machine relies on the dmjread_enable bit to tell it whether to attempt to read dither matrix data 
from DRAM. When dm^read^enable is clear, the state machine does nothing and remains in the idle state. 
When dm_read_enab!e is set, the state machine continues to load dither matrix data, 256-bits at a time 
(received over 4 clock cycles, 64 bits per cycle), while there is space available in the dither matrix buffer. 



28.4.3.4.2 Read address generator 



else 



r<^adr = rd_adr 



28.4.3.4.3 State machine 
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The read address and Hnejstartjadr arc initially set to start_dm_adr The read address gets incremented 
after each read access. It takes 4 or 8 read accesses to load a line of dither matrix into the dither matrix 
buffer, depending on whether we're using a single or double buffer. A count is kept of the accesses to 
DRAM. When a read access completes and accessjcount equals 3 or 7, a line of dither matrix has just 
been loaded from and the read address is updated Xo Unejstartjadr plus linejncrement so it points to the 
start of the next line of dither matrix. (lme_fitart_adr is also updated to this value). If the read address 
equals end^dmjadr then the next read address will be start_dm_adr, thus the read address wraps to point 
to the start of the area in DRAM where the dither matrix is stored. 

The write address for the dither matrix buffer is implemented by means of a modulo-32 counter that is ini- 
tially set to 0 and incremented when diujicu_rvalid is asserted. 
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RflsetQRprst n"»Q 
hcu_dJu_rTeq « 0 
hcu_dlu_radf a 0 
accass.oouni « 0 
wr_adf m 0 

( reset ) 



hcu_diu_iTeq = 



hcu_diu_radr = hcii_diu_fadr 
access^count » access.count 
wr^adr = wcadr 



idle ^ 



"gp^^l AND 
dm read enable 1 
hcu_dju_rTeq « 0 
hcu_d5u_radr = start_dm_adr 
Una^start^adf = 8tafLdm_adr 
access.count a o 
wr adr = 0 



dhj hcu rvalld =- 1 AND 

. access count =^ 3/7 ANP 

hcu dhi radff^end dm adr 



hcu_diu_rreq « o 
hcu^diu^radr = Kne_start_adr + 
HneJncTQmant 
fine_start_adf = line_start_adf + 
linajncremerrt 
acce$s_count « 0 
wr_adr ♦+ 



req 



c 



> 



hcu_dlu_rreq s= 1 
hcu_dJu_radr = 0 
acc8S3_count = accesSuCOunt 
w_adr = wr_adr 



ack 



c 



diu hcu rack -= 1 
hcu_dlu_rreq » 0 
hcu_dlu_radr * hcu_dlu_radr 
access.count s aocess.count 
wr_adr » wr^adr 



readl 



diu hcu rvalid =■ 1 AMP 

access court != 3/7 AND 
hcu diu radrlgftnd dm adf 



hcu_diu_fTeq « 0 
hcu_diu_radr » hcu_diu_fadr ■•- 1 
access_count 
wr_adr +■»■ 



C 



J 



hcu_dJu_rfeq = 0 
hcu_diu_radr = hcu_dlu_radr 
access^count » aocess^count 

wr_adr -m- 



read2 



c 



3 



diu hcu fvalld = 1 

hcu_diu_rroq = 0 
hcu_dlu_radr - hcu_diu_radr 
access.count = access.couni 
Wf_adr -M- 



reads 



3 



diu hcu ivalM 1 
hcu_diu_rroq » 0 
hcii_dju_radr a hcu_cfiu_radr 
aocess_coijnt » aocess.oount 
wr_adr +-»- 



X ^^^^^ y 



diu hcu rvaitd 1 AND 
hcu diu radr==end dm adr 
hcu.diu^rreq = 0 
hcu_diu_radr s start_dm_adr 
line_start_adr = starL.dm_adr 
access_count = 0 
wr^adr -m- 



Figure 201 . State machine to read dither matrix table 
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28.4.4 Contone dotgen unit 



The coQtone dotgen unit is responsible for producing a dot in up to 4 color planes per cycle. The contone 
dotgen unit also produces a cpjavail flag which specifies whether or not contone pixels are currently avail- 
able, and the output hcu^cju^advdot to request the CFU to provide the next contone pixel in up to 4 color 
planes. 

The block diagram for the contone dotgen unit is shown in Figure 202. 



c 
P 



o 
O 



hcu_chJ.Bdvdot 



cfu_hcu.cOdata 



cfu.hcu.cldata 
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cfu_hcu_avail 



contone dotgen unit 



32. 



-at; 



cpO_constant \ ^ 



dither unit 0 
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dither unit \ 
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dither unit 2 
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► 



dither unit 3 



' advdot 
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cp[0-31_constanl 
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Figure 202. Contone dotgen unit 

A dither unit provides the functionality for dithering a single contone plane. The contone image is only 
defined within the contone/spot margin area. As a result, if the input flag injtarget j>age is 0, then a con- 
stant contone pixel value is used for the pixel instead of the contone plane. 

The resultant contone pixel is then halftoncd. The dither value to be used in the halftoning process is pro- 
vided by the control data unit The halftoning process involves a comparison between a pixel value and its 
corresponding dither value. If the 8-bit contone value is greater than or equal to the 8-bit dither matrix 
value a 1 is output. If not, then a 0 is output This means each entry in the dither matrix is in the range 1- 
255 (0 is not used). 
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28.4.5 Spot dotgen unit 

The spot dotgen unit is responsible for producing a dot of bi-level data per cycle. It deals with bi-level data 
(and therefore does not need to halftone) that comes from the LBD via the SFU. Like the contone layer, 
the bi-level spot layer is only defined within the contone/spot margin area. As a result, if input flag 
injtarget^page is 0, then a constant dot value (typically this would be 0) is used for the output dot* 

The spot dotgen unit also produces a s_avail flag which specifies whether or not spot dots are currently 
available for this spot plane, and the output hcu^Ju_advdot to request the SFU to provide the next bi-level 
data value. The spot dotgen unit can be represented by the following pseudocode: 

s_«vail « sfu_hciJL_avail 

if (in_target_pago == 1 AND advdot == 1) then 

bcu_afu_advdot = 1 
else 

hcu_sfu_advdot » 0 

if ( in_1:arget_page 1) then 

sp = sfu^hcu.sdata 
else 

sp a sp.constant 

28.4.6 Tag dotgen unit 

This unit is very similar to the spot dotgen unit (see Section 28.4.5) in that it deals with bi-level data, in 
this case from the TE via the TFU. The tag layer is only defined within the tag margin area. As a result, if 
input flag injcag^target ^age is 0, then a constant dot value, tp^constant (typically this would be 0). is 
used for the output dot. The tagplane dotgen unit also produces a tp^avail flag which specifies whether or 
not tag dots are cuxrently available for the tagplane, and the output hcu^tfu^advdot to request the TFU to 
provide the next bi-level data value. 

Dot reorg unit 

The dot reorg unit provides a means of mapping the bi-level dithered data, the spotO color, and the tag data 
to output inks in the actual printhead. Each dot reorg unit takes a set of 6 1-bit inputs and produces a single 
bit output that represents the output dot for that color plane. 

The output bit is a logical combination of any or all of the input bits. This allows the spot color to be 
placed in any output color plane (including infrared for testing purposes), black to be merged into cyan, 
magenta and yellow (in the case of no black ink in the Memjet printhead), and tag dot data to be placed in 
a visible plane. An output for fixative can readily be generated by simply combining desired input bits. 

The dot reorg unit contains a 64-bit lookup to allow complete freedom with regards to mapping. Since all 
possible combinations of input bits are accounted for in the 64 bit lookup, a given dot reorg unit can take 
the mapping of other reorg units into account. For example, a black plane reorg unit may produce a I only 
if the contone plane 3 or spot color inputs are set (this effectively composites black bi-level over the con- 
tone). A fixative reorg unit may generate a I if any 2 of the output color planes is set (taking into account 
the mappings produced by the other reorg units). 

If dead nozzle replacement is to be used (see section 29.4.2 on page 448), the dot reorg can be pro- 
granuned to direct the dots of the specified color into the main plane, and 0 into the other. If a nozzle is 
5ien marked as dead in the DNC, swapping the bits between the planes will result in 0 in the dead nozzle, 
and the required data in the other plane. 

If dead nozzle replacement is to be used, and there arc no tags, the TE can be programmed with the posi- 
tion of dead nozzles and the resultant pattern used to direct dots into the specified nozzle row. If only fixed 
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background TFS is to be used, a limited number of nozzles can be replaced. If variable tag data is to be 
used to specify dead nozzles, then large numbers of dead nozzles can be readily compensated for. 

The dot reorg unit can be used to average out the nozzle usage when two rows of nozzles share the same 
ink and tag encoding is not being used The TE can be programmed to produce a regular pattern (e.g. 0101 
on one line, and 1010 on the next) and this pattern can be used as a directive as to direct dots into the spec- 
ified nozzle row. 

Each reorg unit contains a 64-bit lOMapping value programmable as two 32-bit HCU registers, and a set 
of selection logic based on the 6-bit dot input (2^ = 64 bits), as shown in Figure 203. 



Input dot 



4^ 



lOmoppIng 
(64 bits) 



dot reorg 



w output 
^ dot 



Figure 203. Block diagram of dot reorg unit 

The mq>ping of input bits to each of the 6 selection bits is as defined in Table 145. 
Tabfe 145. Mapping of input bits to 6 selection bits 



^^^^^^^^^ 






0 


bi-level dot from contone layer 0 


cyan 


1 


bi-tevel dot from contone layer 1 


magenta 


2 


bi-levet dot from contone layer 2 


yellow 


3 


bi-level dot from contone layer 3 


black 


4 


bt-revef spotO dot 


black 


5 


bi-level tag dot 


infra-red 
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29 Dead Nozzle Compensator (DNC) 



29.1 Overview 

The Dead Nozzle Con^ensator (DNC) is responsible for adjusting Memjet dot data to take account of 
non-functioning nozzles in the Memjet printhcad. Input dot data is supplied from the HCU, and the cor- 
rected dot data is passed out to the DWU. The high level data path is shown by the block diagram in Figure 
204. 







DRAM 












Dead Nozzle 
Data 

r 




HCU 


raw dot 


DNC 


compensated^ 


DWU 


data ^ 


dot ' 



Figure 204. High level block diagram of DNC 



The DNC compensates for a dead nozzles by performing the following operations: 

• Dead nozzle removal, i.e. turn the nozzle off 

• Ink replacement by direct substitution i.e. K -> K 

• Ink replacement by indirect substitution i.e. K -> CMY 

• Error diffusion to adjacent nozzles 

• Fixative corrections 

The DNC is required to efficiently support up to 5% dead nozzles, under the expected DRAM bandwidth 
allocation, with no restriction on where dead nozzles are located and handle any fixative correction due to 
nozzle compensations. Performance must degrade gracefully after 5% dead nozzles. 

29.2 Dead nozzle identification 

Dead nozzles are identified by means of a position value and a mask value. Position information is repre- 
sented by a 10-bit delta encoded format, where the 10-bit value defines the number of dots between dead 
nozzle columns^ With the delta information it also reads the 6-bit dead nozzle mask {dn_mask) for the 
defined dead nozzle position. Each bit in the dnjnask corresponds to an ink plane. A set bit indicates that 
the nozzle for the corresponding ink plane is dead. The dead nozzle table format is shown in Figure 205. 
The DNC reads dead nozzle information from DRAM in single 256-bit accesses. A 10-bit delta encoding 
scheme is chosen so that each table entry is 16 bits wide, and 16 entries fit exactly in each 256-bit read. 
Using 10-bit delta encoding means that the maximum distance between dead no2:zle columns is 1023 dots. 
It is possible that dead nozzles may be spaced ftuther than 1 023 dots firom each other, so a null dead nozzle 
identifier is required. A null dead nozzle identifier is defined as a 6-bit dn_mask of all zeros. These null 
dead nozzle identifiers should also be used so that: 

• the dead nozzle table is a multiple of 16 entries (so that it is aligned to the 256-bit DRAM locations) 



I, for a 10-bit delta value ofd, if the cutrent column n is a dead nozzle column dien the next dead nozzle column is given by n + (</ + 1). 
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• tbe dead nozzle table spans the complete length of the line. i.e. the first entry dead nozzle table should 
have a delta from the first nozzle column in a line and the last entry in the dead nozzle table should cor- 
respond to the last nozzle column in a line. 

Note that the DNC deals with the width of a page. This may or may not be the same as the width of the 
printhead (the PHI may introduce some margining to the page so that its dot output matches the width of 
the printhead). Care must be taken when programming the dead nozzle table so that dead nozzle positions 
are correctly specified with respect to the page and printhead. 



16 bits wide 



Table Entry Structure 



N dead nozzle 
columns 




10-bit Delta Encode 


1 6-bit OnMask | 


^ 




bits 15*6 


► 

bits 5-0 



Figure 205. Dead nozzle table format 



29.3 DRAM storage and bandwidth requirement 

The memory required is largely a factor of the number of dead nozzles present in the printhead (which in 
turn is a factor of the printhead size). The DNC is required to read a 1 6-bit entry from the dead nozzle table 
for every dead nozzle. Table 146 shows the DRAM storage and average^ bandwidth requirements for the 
DNC for different percentages of dead nozzles and different page sizes. 



Table 146. Dead Nozzle storage and average bandwidth requirements 





i 








i 






Memory 

(KBytes) 


Bandwidth 
(bits/cycle) 


A4* 


5% 


1.4« 


0.8** 


10% 


2,7 


1.6 


15% 


4.1 


2.4 


A3^ 


5% 


1.9 


0.8 


10% 


3.8 


1.6 


15% 


5.7 


2.4 



Bi-lithic printhead has 13824 nozzles per color providing full bleed printing for A4/Lctter 
Bi-lidiic printhead has 1 9488 nozzles per color providing full bleed printing for A3 



1. Average bandwidth assumes an even spread of dead nozzles. Clumps of dead nozzles may cause delays due to insufficient available 
DRAM bandwidth. These delays will occur every line causing an accumulative delay over a page. 
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c. 16 bits X 13824 nozzles x 0.05 dead 

d. (16 bits read / 20 cycles) = O.S bits/cycle 



29.4 



Nozzle compensation 



DNC receives 6 bits of dot information every cycle from the HCU, 1 bit per color plane. When the dot 
position corresponds to a dead nozzle column, the associated 6-bit dn_rnask indicates which ink planc{s) 
contains a dead nozzle(s). The DNC first deletes dots destined for the dead nozzle. It then replaces those 
dead dots, either by placing the data destined for the dead nozzle into an adjacent ink plane (direct substi- 
tution) or into a niunber of ink planes (indirect substitution). After ink replacement, if a dead nozzle is 
made active again then the DNC performs error diffusion. Finally, following the dead nozzle compensa- 
tion mechanisms the fixative, if present, may need to be adjusted due to new nozzles being activated, or 
dead nozzles being removed. 



29«4.1 Dead nozzle removal 

If a nozzle is defined as dead, then the first action for the DNC is to turn off (zeroing) the dot data destined 
for that nozzle. This is done by a bit-wise ANDing of the inverse of the dn^mask with the dot value. 

29.4.2 Ink replacement 



Ink replacement is a mechanism where data destined for the dead nozzle is placed into an adjetcent ink 
plane of the same color (direct substitution, i.e. K -> Kaitgniative)» placed into a number of ink planes, the 
combination of which produces the desired color (indirect substitution, i.e. K -> CMY). Ink replacement is 
performed by filtering out ink belonging to nozzles that are dead and then adding back in an appropriately 
calculated pattern. This tsvo step process allows the optional re-inclusion of the ink data into the original 
dead nozzle position to be subsequently error diffused. In the general case, fixative data destined for a dead 
nozzle should not be left active intending it to be later diffused 

The ink replacement mechanism has 6 ink replacement patterns, one per ink plane, programmable by the 
CPU. The dead nozzle mask is ANDed with the dot data to see if there are any planes where the dot is 
active but the corresponding nozzle is dead. The resultant value forms an enable, on a per ink basis, for the 
ink replacement process. If replacement is enabled for a particular ink, the values from the corresponding 
replacement pattem register are ORed into the dot data. The output of the ink replacement process is then 
filtered so that error diffusion is only allowed for the planes in which error diffusion is enabled. The output 
of the ink replacement logic is ORed with the resultant dot after dead nozzle removal. See Figure 210 on 
page 459 for implementation details. 

For example if we consider the printhead color configuration C,M,Y,Ki,K2»IR and the input dot data from 
the HCU is blOl 100. Assuming that the Kj ink plane and IR ink plane for this position are dead so the 
dead nozzle mask is bOOOlOl. The DNC first removes the dead nozzle by zeroing the Kj plane to produce 
b 10 1000, Then the dead nozzle mask is ANDed with the dot data to give bOOOlOO which selects the ink 
replacement pattem for Kj (in this case the ink replacement pattem for Kj is configured as bOOOOlO, i,e. 
ink replacement into the K2 plane). Providing error diffusion for FC2 is enabled, the output from the ink 
replacement process is bOOOOlO, This is ORed with the output of dead nozzle removal to produce the 
resultant dot blOlO 10. As can be seen the dot data in the defective Kj nozzle was removed and replaced by 
a dot in the adjacent K2 nozzle in the same dot position, i.e. direct substitution. 

In the example above the Ki ink plane could be compensated for by indirect substitution, in which case ink 
replacement pattem for Kj would be configured as bl 1 1000 (substitution into the CMY color planes), and 
this is ORed with the output of dead nozzle removal to produce the resultant dot bl 11000. Here the dot 
data in the defective ink plane was removed and placed into the CMY ink planes. 
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29.4.3 Enx>r diffusion 

Based on the programming of the lookup table the dead nozzle may be left active after ink replacement. In 
such cases the DNC can compensate using error diftusion. Error diffusion is a mechanism where dead noz- 
zle dot data is diffused to adjacent dots. 

When a dot is active and its destined nozzle is dead, the DNC will attempt to place the data into an adja- 
cent dot position, if one is inactive. If both dots are inactive then the choice is arbitrary, and is determined 
by a pseudo random bit generator. If both neighbor dots are already active then the bit cannot be compen- 
sated by diffusion. 

Since the DNC needs to look at neighboring dots to determine where to place the new bit (if required), the 
DNC woiks on a set of 3 dots at a time. For any given set of 3 dots, the first dot received from the HCU is 
refeired to as dot A, and the second as dot B, and the third as dot C. The relationship is shown in Fieure 
206. 



n-1 



dot A 



dots 



dote 



direction of dot movement 



Figure 206. Set of dots operated on for error diffusion 



For any given set of dots ABC, only B can be compensated for by error diffusion if B is defined as dead. A 
1 in dot B will be diffused into either dot A or dot C if possible. If there is already a 1 in dot A or dot C 
then a 1 in dot B cannot be diffused into that dot. 

The DNC must support adjacent dead nozzles. Thus if dot A is defined as dead and has previously been 
compensated for by error diffusion, then the dot data from dot B should not be diffused into dot A. Simi- 
larly, if dot C is defined as dead, then dot data from dot B should not be diffused into dot C. 

Error diffusion should not cross line boundaries. If dot B contains a dead nozzle and is the first dot in a line 
then dot A represents the last dot from the previous line. In this case an active bit on a dead nozzle of dot B 
should not be diffused into dot A. Similarly, if dot B contains a dead nozzle and is the last dot in a line then 
dot C represents the first dot of the next line. In this case an active bit on a dead nozzle of dot B should not 
be diffused into dot C. 

Thus, as a rule, a 1 in dot B caiuiot be diffused into dot A if 

• a 1 is already present in dot A, 

• dot A is defined as dead, 

• or dot A is the last dot in a line. 

Similarly, a I in dot B cannot be diffused into dot C if 

• a 1 is already present in dot C, 

• dot C is defined as dead, 

• or dot C is the first dot in a line. 

If B is defined to be dead and the dot value for B is 0, then no compensation needs to be done and dots A 

and C do not need to be changed. 

If B is defined to be dead and the dot value for B is 1 , then B is changed to 0 and the DNC attempts to 
place the 1 from B into either A or C: 

• If the dot can be placed into both A and C, then the DNC must choose between them. The preference is 
given by the current output from the random bit generator, 0 for "prefer left" (dot A) or 1 for "prefer 
right*' (dot C). 
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• If dot can be placed into only one of A and C, then the 1 from B is placed into that position. 

♦ If dot cannot be placed into either one of A or C, then the DNC cannot place the dot in either position. 
Table 147 shows the truth table for DNC error difiusion operation when dot B is defined as dead. 

Table 147. Error Diffusion Truth Table when dot B 1$ dead 




A input 



A {nput 



1= 



A input 



A input 



A input 



A input 



A input 



C input 



C input 



C input 



C input 



C Input 



C input 



1 



C input 



a. Output from random bit generator Determines direction of error diffusion (0 = left, 1 = right) 

b. Bold emphasis is used to show the DNC insetted a 1 

The random bit value used to arbitrarily select the direction of diffusion is generated by a 32-bit maximum 
length random bit generator. The generator generates a new bit for each dot in a line regardless of whether 
the dot is dead or not. The random bit generator can be initialized with a 32-bit progranunable seed value. 



29.4.4 Fixative correction 

After the dead nozzle compensation methods have been applied to the dot data, the fixative, if present, may 
need to be adjusted due to new nozzles being activated, or dead nozzles being removed For each output 
dot the DNC determines if fixative is required (using the FixativeRequiredMask register) for the new com- 
pensated dot data word and whether fixative is activated already for that dot. For the DNC to do so it needs 
to know the color plane that has fixative, this is specified by the FixativeMaskl configuration register. 
Table 148 indicates the actions to take based on these calculations. 



Table 143. Truth table for fixative correction 





1^3 




1 


1 


Output dot as is. 


1 


0 


Clear fixative piane. 


0 


1 


Attempt to add fixative. 


0 


0 


Output dot as Is. 



The DNC also allows the specification of another fixative plane, specified by the FixativeMaskl configura- 
tion register, with FixativeMaskl having the higher priority over FixativeMask2 . When attempting to add 
fixative the DNC first tries to add it into the planes defined by FixativeMaskl. However, if any of these 
planes is dead then it tries to add fixative by placing it into the planes defined by FixativeMaskl. 
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Note that the fixative defined by FixativeMaskJ and FixativeMaskI could possibly be multi-pait fixative, 
i.e. 2 bits could be set in FixathfeAfaskl with the fixative being a combination of both inks. 



Doc: Sc>PEC_hardware_design . S3 Proprietary Document 

Version: 2.3 



29 Nov 2002 
Page 451 



SoPEC : Hardware Design 



Si 



29.5 Implementation 

A block diagram of the DNC is shown in Figure 207. 
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Figure 207. Block diagram of DNC 
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29.5.1 Definitions of I/O 



Table 149. DNC port list and description 











Clocks and Resets 




pctk 


1 


In 


System Clock. 


prst.n 


1 


In 


System reset, synchronous active low. 


PCU Interface 


pcu_dnc_sel 


1 


In 


BJock select from the PCU. When pcu^dnc^sells high both 
poi^adrand pcuLdataoiifare valid. 


pcu_rwn 


^ 


In 


Common read^rtot-write signal from the PCU. 


pcu_adr(6:2} 


5 


In 


PCU address bus. Only 5 bits are required to decode the 
address space for this block. 


pcu_datacut(31:0] 


32 


In 


Shared write data bus from the PCU. 


dncj)cu_fdy 


1 


Out 


Ready signal to the PCU. When dnc _pcu_fidy is high it indi- 
cates the last cycle of the access. For a write cycle this 
means pcu.dafaoaf has been registered by the block and for 
a read cycle this means the data on dnc.j>cujdBta is valid. 


dnc_pcu_data[31 :0] 


32 


Out 


Read data bus to the PCU. 


DIU Interface 


dnc_cliu_rreq 


1 


Out 


DNC unit requests DRAM read. A read request must be 
accompanied by a valid read address. 


dnc_diu_radit21:5] 


17 


Out 


Read address to DIU. 256-bit word aligned. 


diu_dnc_rack 


1 


In 


Acknowledge from DIU that read request has been accepted 
and new read address can be placed on dnc^d/u_racSr 


diu^dnc.rvalid 


1 


In 


Read data valid, active high. Indicates that valid read data is 
now on the read data bus, diu^data. 


diu„data[63:0] 


64 


!n 


Read data from DIU. 


HCU Interface 


dnq_hcu_ready 


1 


Out 


Indicates that DNC Is ready to accept data from the HCU. 


hcu_dnc_avail 


1 


In 


Indicates valid data present on hcu^dnc_data. 


hcu_dnc_data[5:0J 


6 


In 


Output bi-level dot data in 6 ink planes. 


DWU Interface 


dwu_dnc_ready 


1 


In 


Indicates that DWU is ready to accept data from the DNC. 


dnc_dwu_avail 


1 


Out 


Indtoates valid data present on dnc_dwu_data. 


dnc_dwu_data[5:0] 


6 


Out 


Output bi-level dot data in 6 ink planes. 



29.5.2 Configuration registers 

The configuration registers in the DNC arc programmed via the PCU interface. Refer to section 21.8.2 on 
page 257 for the description of the protocol and timing diagrams for reading and writing registers in the 
DNC. Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register reads 
and writes, the lower 2 bits of the PCU address bus are not required to decode the address space for the 
DNC. When reading a register that is less than 32 bits wide zeros should be returned on the upper unused 
bit(s) of dnc^cu^data. Table 1 50 lists the configuration registers in the DNC. 
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Table 150. DNC configuration registers 













Control registers 


0x00 


Reset 


1 


0x1 


A write to this register causes a reset of the 
DNC. 


0x04 


Go 


1 


0x0 


Writing 1 to this register starts the DNC. Writing 
0 to this register halts the DNC. 
When Go is asserted all counters, flags etc. are 
cleared or given their initial value, but configura- 
tion registers keep their values. 
When Go is deasserted the state-machines go 
to their Idle states but all counters and configu- 
ration registers keep their values. 
This register can be read to deterntine if the 
DNC is running 
(1 = running. 0 s stopped). 


Setup registers (constant during processing) 


0x10 


MaxDot 


16 


0x0000 


This is the maximum dot number - 1 present 
across a page. For example if a page contains 
13824 dots, then MaxDor wiir be 13823. 
Note that this number may or may not be the 
same as the number of dots across the print* 
head as some margining may be introduced in 
the PHi. 


0x14 


LSFR 


32 


0x0000^ 
0000 


The current value of the LPSR register used as 
the 32-bit maximum length random bit genera* 
tor. 

Users can write to this register to program a 
seed value for the 32*bit maximum length ran- 
dom bit generator. Must not be alMs for taps 
implememed in XNOR form. <tt is expected ttiat 
writing a seed value will not occur during the 
operation of the LFSR). 

This LSFR value coukJ also have a possible use 
as a random source in program code. 


0x20 


RxativeMaskI 


6 


0x00 


Defines the higher priority fixative plane(s). Bit 0 

represents the settings for plane 0» bit 1 for 

plane 1 etc. For each bit 

1 = the ink plane contains fixative. 

0 = the Ink plane does not contain fixative. 


0x24 


FixativeMask2 


6 


0x00 


Defines the tower priority fixative plane(s). Bit 0 

represents the settings for plane 0, bit 1 for 

plane 1 etc. Used only when RxativeMaskI 

planes are dead. For each bit: 

1 s the ink plane contains fixative. 

0 = the ink plane does not contain fixative. 


0x28 


FixativeRequiredMask 


6 


0x00 


Identifies the ink planes that require fixative. Bit 

0 represents the settings for plane 0. bit 1 for 
plane 1 etc. For each bit: 

1 s the ink plane requires fixative. 

0 = the ink plane does not require fixative (e.g. 
Ink is self-fixing) 
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Table 150. DNC configuration registers 









WW' 














0x30 


DnTableStaitAdr 


17 


0x0.0000 


Start address of Dead Nozzle Table in DRAM, 
specified in 256-bit words. 


0x34 


DnTableEndAdr 


17 


0x0.0000 


End address of Dead Nozzle Table in DRAM, 
specified in 256-bn words, i.e. tfie location con- 
taining the last entry in the Dead Nozzle TaUe. 
The Dead Nozzle Table should be aligned to a 
256-blt boundaiy if necessary it can be padded 
with null entries. 


0x40 - 0x54 


Plane Replace Pat- 
tern{5:0] 


6x6 


0x00 


Defines the inic replacement pattern for each of 
the 6 ink planes. PlaneReptacePattemlO) is the 
ink replacement pattern fbr plane 0, PtaneRe- 
pfacePatteml 1] is the ink replacement pattem 
for plane 1 , etc. 

For each 6*bft replacement pattern for a plane, 
a 1 in any bit positions indicates the alternative 
ink planes to be used for this plane. 


0x58 


Diffuse Enable 


6 


Ox3F 


Defines whether, after ink replacement, error 
diffusk>n is allowed to be performed on each 
plane. 

Bit 0 represents the settings for plane 0. bit 1 for 
plane 1 etc. Fbr each bit: 
1 B error diffusion is enabled 
0 s error diffusion Is disat>led 


Debug registers (read only) 


0x60 


DncOutputDebug 


8 


N/A 


Bit 7 = dwu_<inc_rBady 
Bit 6 « dnc_dmj^avait 
Bits 5-0 = dnc_dwu_data 


0x64 


DncReplaceDebug 


14 


Ni/A 


Bit 13 s adu^rsady 
Bit12 = /n/.ava// 
Bits 11-6 = inj_dn^mask 
Bits 5-0 = inj^data 


0x68 


DncDiffuseDebug 


14 


N/A 


Bit 13 = dwu^dnc_ready 
Bit 12 = dnc^dwu_avaU 
Bits 11-6 = edu_dn_mask 
Bits 5-0 = edu_data 



Doc: S6PEC_hafdware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 455 



SoPEC : Hardware Design 



SI 



29.5.3 Ink replacement unit 

Figure 208 shows a sub-block diagram for the ink replacement unit 
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Figure 208. Sub-block diagram of ink replacement unit 



29.5.3.^ Control unit 

The control unit is responsible for reading the dead nozzle table from DRAM and making it available to 
the DNC via the dead nozzle FIFO. The dead nozzle table is read from DRAM in single 256-bit accesses, 
receiving the data from the DIU over 4 clock cycles (64*bits per cycle). The protocol and timing for read 
accesses to DRAM is described in section 20.9,1 on page 208. Reading from DRAM is implemented by 
means of the state machine shown in Figure 209. 

All counters and flags should be cleared after reset. When Go transitions from 0 to 1 all counters and flags 
should take their initial value. While the Go bit is 1, the state machine requests a read access from the dead 
nozzle table in DRAM provided there is enough space in its FIFO. 

A modulo-4 covmter, rd_count, is used to count each of the 64-bits received in a 256-bit read access. It is 
incremented whenever diu_dnc_rvalid is asserted. When Go is 1, dnjtable^radr is set to 
dn_jable_start_adK As each 64-bit value is returned, indicated by diu_dnc_rvalid being asserted, 
dn_table_radr is compared to dnjiablejand^adn 

• If rd^count equals 3 and dnjtable_radr equals dn_table_end_adr, then dn_table_radr is updated to 

dnjtable_start_adr. 

• If rd_count equals 3 and dnjtable^radr does not equal dn_table_end_adr, then dn_table_radr is incre- 
mented by ] . 

A count is kept of the number of 64-bit values in the FIFO. When diu_dnc_rvalid is 1 data is written to the 
FIFO by asserting wr_en^ and fifo^contents and fifo_wr_adr are both incremented. 
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V/hcn Jifo^contentsf 3 :0J is greater than 0 and edujready is I, dncjxcu^ready is asserted to indicate that 
the DNC is ready to accept dots from the HCU. If hcu^dnc^avail is also 1 then a dotadv pulse is sent to the 
GenMask unit, indicating the DNC has accepted a dot from the HCU, and iru^avail is also asserted. After 
Go is set» a single preload pulse is sent to the GenMask unit once the FIFO contains data« 

When a rd^adv pulse is received from the GenMask \imt,fifo^rd^adrf4:0J is then incremented to select 
the next 16-bit value. lffifo_rd_adr[I:OJ = 1 1 then the next 64-bit value is read from the FIFO by asserting 
rd^en, and /ifo^contents[3:0J is decremented 



_ >le_rTcq o 0 
dn.tabl8_radr « 0 



^ reset ^ 



dn_i£bfe^«o ^ idle ^ 



dn table mdr != dn table end adr 



AND fd count = 3 
ablf 



dn_lablc_n-cq » 0 
dn_table_mar 4^ 



C 



<: 
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Figure 209. Dead nozzle table state machine 



29.5,3,2 Dead nozzle FIFO 



The dead nozzle FIFO conceptually is a 64-bit input, and 16-bit output FIFO to account for the 64-bit data 
transfers from the DIU, and the individual 16-bil entries in the dead nozzle table that are used in the Gen- 
Mask unit In reality, the FIFO is actually 8 entries deep and 64-bits wide (to acconmiodate two 256-bit 
accesses). 

On the DRAM side of the FIFO the write address is 64-bit aligned while on the GenMask side the read 
address is 16-bit aligned, i.e. the upper 3 bits are input as the read address for the FIFO and the lower 2 bits 
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are used to select 16 bits from the 64 bits (Ist 16 bits read cotrespoads to bits 1 5-0, second 16 bits to bits 
31-16 etc.). 

29.5.3.3 GenMaskunlt 

The GenMask unit generates the 6-bit dn_piask that is sent to the replace unit. It consists of a 10-bit delta 
counter and a mask register. 

After Go is set, the GenMask unit will receive a przload pulse from the control unit indicating the first 
dead nozzle table entry is available at the output of the dead nozzle FIFO and should be loaded into the 
delta counter and mask register. A rdjadv pulse is generated so that the next dead nozzle table entry is pre- 
sented at the output of the dead nozzle FIFO. The delta counter is decremented every time a dotadv pulse 
is received When the delta counter reaches 0, it gets loaded with the current delta value output from the 
dead nozzle FIFO, i.e. bits 15-6, and the mask register gets loaded with mask output from the dead nozzle 
FIFO, i.e. bits 5-0. A rd^adv pulse is then generated so that the next dead nozzle table entry is presented at 
the output of the dead nozzle FIFO. 

When the delta coimter is 0 the value in the mask register is output as the dnjnasK otherwise the dnjnask 
is all Os. 

The GenMask unit has no knowledge of the number of dots in a line, it simply loads a counter to count the 
delta from one dead nozzle column to the next. Thus as described in section 29.2 on page 446 the dead 
nozzle table should include null identifiers if necessary so that the dead nozzle table covers the first and 
last nozzle column in a line. 

29.5.3.4 Replace unit 

Dead nozzle removal and ink replacement are implemented by the combinatorial logic shown in Figure 
210. Dead nozzle removal is performed by bit-wise ANDing of the inverse of the dn_jnask with the dot 
value. 

The ink replacement mechanism has 6 ink replacement patterns, one per ink plane, programmable by the 
CPU. The dead nozzle mask is ANDed with the dot data to see if there are any planes where the dot is 
active but the corresponding nozzle is dead. The resultant value forms an enable, on a per ink basis, for the 
ink replacement process. If replacement is enabled for a particular ink, the values from the corresponding 
replacement pattern register are ORed into the dot data. The output of the ink replacement process is then 
filtered so that error diffusion is only allowed for the planes in which error diffusion is enabled. 

The output of the ink replacement process is ORed with the resultant dot after dead nozzle removal. If the 
dot position does not contain a dead nozzle then the dnjnask will be all Os and the dot, hcujdncjiata^ will 
be passed through unchanged. 



Doc: SoPEC_hardware_design S3 Proprietary Document 29 Nov 2002 

Version: 2.3 Page 458 



• 



SoPEC : Hardware Design 




i.data 



Figure 210. Logic for dead nozzle removal and Ink replacement 
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29.5.4 Error Diffusion Unit 

Figure 21 1 shows a sub-block diagram for the error diffusion unit. 
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Figure 211. Sub*block diagram of error diffusion unit 



29.5.4. f Random Bit Generator 

The random bit value used to arbitrarily select the direction of diffusion is generated by a maximum length 
32-bit LFSR- The tap points and feedback generation are shown in Figure 212. The LFSR generates a new 
bit for each dot in a line regardless of whether the dot is dead or not, i.e shifting of the LFSR is enabled 
when advdot equals 1. The LFSR can be initialised with a 32-bit progranunable seed value, random_seecL 
This seed value is loaded into the LFSR whenever a write occurs to the RandomSeed register. Note that the 
seed value must not be all Is as this causes the LFSR to lock-up. 
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Figure 212. Maximum length 32-bit LFSR used for random bit generation 



29,5,4,2 Actvance Dot Unit 

The advance dot unit is responsible for determining in a given cycle whether or not the error diffuse unit 
will accept a dot from the ink replacement unit or make a dot available to the fixative correct unit and on to 
the DWU. It therefore receives the dwu_dnc_ready control signal from the DWU, the irujavaii flag fr-om 
the ink replacement unit, and generates dnc_dwu_fivail and edu^ready control flags. 

Only the dwujtncj-eady signal needs to be checked to see if a dot can be accepted and asserts edujready 
to indicate this. If the error diffuse unit is ready to accept a dot and the ink replacement unit has a dot avail- 
able, then a advdot pulse is given to shift the dot into the pipeline in the diffuse unit. Note that since the 
error diffusion operates on 3 dots, the advance dot unit ignores dwu_dnc_ready initially until 3 dots have 
been accepted by the diffuse unit. Similarly dncjdwu^avail is not asserted until the diffuse unit contains 3 
dots and the ink replacement unit has a dot available. 
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29.5.4.3 Diffuse Unit 

The dififuse unit contains the combinatorial logic to implement the truth table from Table 147. The diffuse 
unit receives a dot consisting of 6 color planes (1 bit per plane) as well as an associated 6-bit dead nozzle 
mask value. 

Error diffusion is ^piied to all 6 planes of the dot in parallel. Since error diffusion operates on 3 dots, the 
diffuse unit has a pipeline of 3 dots and their corresponding dead nozzle mask values. The first dot 
received is referred to as dot A, and the second as dot B» and the third as dot C. Dots are shifted along the 
pipeline whenever advdot is 1. A count is also kept of the number of dots received. It is incremented when- 
evQT cuNdot is I, and wraps to 0 when it reaches maxjdot When the dot count is 0 dot C corresponds to the 
first dot in a line. When the dot count is 1 dot A corresponds to the last dot in a line. 

In any given set of 3 dots only dot B can be defined as containing a dead nozzle(s). Dead nozzles are iden- 
tified by bits set in iru^dn_jnask. If dot B contains a dead nozzle(s), the corresponding bit(s) in dot A» dot 
Q the dead nozzle mask value for A, the dead nozzle mask value for C, the dot count, as well as the ran- 
dom bit value are input to the trmh table logic and the dots A, B and C assigned accordingly. If dot B does 
not contain a dead nozzle then the dots are shifted along the pipeline unchanged. 



29.5.5 Fixative Correction Unit 

The fixative correction unit consists of combinatorial logic to implement fixative correction as defined in 
Table 151. For each output dot the DNC detennines if fixative is required for the new conqiensated dot 
data word and whether fixative is activated already for that dot. 

Fixacive Present » ( (FixativeMaskl '| FixativeMask2 > & edu.data) != 0 
FixativeRecjuired = (FixatlveRequlredMaslc & edu_data) != 0 

It then looks up the truth table to see what action, if any, needs to be taken. 



Table 1 51 . Truth table for fixative correction 




Output dot as is. 



dnc_dwu_data = edu data 



Clear fixative plane. 



dnc„dwu_data = <edu_data) & -(RxativeMaskI | RxativeMaskS) 



Attempt to add fixa- 
tive. 



if (RxativeMaskl & DnMask) )= 0 

dnc.dwu.data = (edu.data) | (RxatJveMask2 & 
else 

dnc.dwu.data = (edu jdata) | (RxativeMaskl) 



-DnMask) 



Output dot as is. 



dnc_dwu_data = edu_daia 



When attempting to add fixative the DNC first tries to add.it into the plane defined by FixativeMaskL 
However, if this plane is dead then it tries to add fixative by placing it into the plane defined by 
FixativeMask2, Note that if both FixativeMaskI and FixativeMask2 are both all Os then the dot data will 
not be changed. 
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30 Dotline Writer Unit (DWU) 

30.1 Overview 

The Dotline Writer Unit (DWU) receives 1 dot (6 bits) of color information per cycle from the DNC. Dot 
data received is bundled into 256'bit words and transferred to the DRAM, The DWU (in conjunction with 
the LLU) implements a dot line FIFO mechanism to compensate for the physical placement of nozzles in a 
printhead, and provides data rate smoothing to allow for local complexities in the dot data generate pipe- 
line. 
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Figure 213. High level data flow diagram of DWU in context 



30.2 Physical requirement iiviposed by the printhead 

The physical placement of nozzles in the printhead means that in one firing sequence of all noz2:les, dots 
will be produced over several print lines. The printhead consists of 12 rows of nozzles, one for each color 
of odd and even dots. Odd and even nozzles are separated by D2 print lines and nozzles of different colors 
are separated by print lines. See Figure 214 for reference. The first color to be printed is the first row of 
nozzles encountered by the incoming paper. In the example this is color 0 odd» although is dependent on 
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the prihthead type (see Section 35 Memjet Printhead for other printhead arrangments). Paper passes under 
printhead moving downwards. 
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Figure 214. Printhead Nozzle Layout for conceptual 36 Nozzle bi-fithic printhead 

For example if the physical separation of each half row is SO\xm equating to D,=D2=5 print lines at 
1600dpi. This means that in one firing sequence, color 0 odd nozzles will fire on dotline L, color 0 even 
nozzles will fire on dotline L-D|, color I odd nozzles will fire on dotjine L-D1-D2 and so on over 6 color 
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planes odd and even nozzles. The total number of lines fired over is given as 0+5+5 +5« 0 + 1 1x5 ==55. 

See Figure 215 for example diagram. 
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Figure 215. Paper and printhead nozzles relationship (example with D^=D2=5) 

It is expected that the physical spacing of the printhead nozzles will be 80^m (or 5 dot lines), although 
there is no dependency on nozzle spacing. The DWU is configurable to allow other line nozzle spacings. 



Table 152. Relationship between Nozzle color/sense and line firing 













sense 


Une 


sense 


tine 


Color 0 


even 


L • 


even 


L-5 


odd 


L-5 


odd 


L 


Colon 


even 


L-10 


even 


L-15 


odd 


L-15 


odd 


L-10 


Color 2 


even 


L-20 


even 


L-25 


odd 


L-25 


odd 


L'20 


Color 3 


even 


L-30 


even 


L-35 


odd 


L-35 


odd 


L-30 


Color 4 


even 


L-40 


even 


L-45 


odd 


L-45 


odd 


L-40 


Colors 


even 


L-50 


even 


L-55 


odd 


L-55 


odd 


L-50 



30.3 Line rate de-coupling 

The DWU block is required to compensate for the physical spacing between lines of nozzles. It does this 
by storing dot lines in a FIFO (in DRAM) until such time as they are required by the LLU for dot data 
transfer to the printhead interface. Colors are stored separately because they are needed at different times 
by the LLU. The dot line store must store enough lines to compensate for the physical Une separation of 
the printhead but can optionally store more lines to allow system level data rate variation between the read 
(printhead feed) and write sides (dot data generation pipeline) of the FIFOs. 
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Side 



A logical representation of the FIFOs is shown in Figure 2 1 6. where N is defined as the optional number of 
extra half lines in the dot line store for data rate de-coupling. 
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Figure 216. Dot line store logical representation 



30.4 Dot line store storage requirements 

For an arbitrary page width of d dots (where d is even), the number of dots per half line is d/2. 

For interline spacing of D2 and inter-color spacing of D], with C colors of odd and even half lines, the 
number of half line storage is (C - I) (D2+D1) + DL 

For N extra half line stores for each color odd and even, the storage is given by (N * C * 2). 
The total storage requirement is ((C - 1) (D2+D,) + Dl + (N ♦ C ♦ 2)) ♦ d/2 in bits. 
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Note that when determining the storage requirements for the dot line store, the number of dots per line is 
the page width and not necessarily the printhead width. The page width is often the dot margin number of 
dots less than the piinthead width. They can be the same size for full bleed printing. 

For example in an A4 page a line consists of 13824 dots at 1600 dpi, or 6912 dots per half dot line. To 
store just enough dot lines to account for an inter-line nozzle spacing of 5 dot lines it would take 55 half 
dot lines for color 5 odd, 50 dot lines for color 5 even and so on, giving 55+50+45...10+5-K)- 330 half dot 
lines in total. If it is assumed that N=4 then the storage required to store 4 extra half hnes per color is 4 x 
12=48, in total giving 330+48=378 half dot lines. Each half dot line is 6912 dots, at 1 bit per dot give a 
total storage requirement of 6912 dots x 378 half dot lines / 8 bits = Approx 319 Kbytes. Similarly for an 
A3 size page with 19488 dots per line, 9744 dots per half line x 378 half dot lines / 8 = Approx 899 
Kbytes. 

Table 153. Storage requirement for dot line store 
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The potential size of the dot line store makes it unfeasible to be implemented in on-chip SRAM, requiring 
the dot line store to be implemented in embedded DRAM. This allows a configurable dotline store where 
unused stprage can be redistributed for use by other parts of the system. 



30.5 Local buffering 

An embedded DRAM is expected to be of the order of 256 bits wide, which results in 27 words per half 
line of an A4 page, and 54 words per half line of A3. This requires 27 words x 1 2 half colors (6 colors odd 
and even) = 324 x 256-bit DRAM accesses over a dotline print time, equating to 6 bits per cycle (equal to 
DNC generate rate of 6 bits per cycle). Each half color is required to be double buffered, while filling one 
buffer the other buffer is being written to DRAM. This results in 256 bits x 2 buffers x 12 half colors i.e. 
6 144 bits in total. 

The buffer requirement can be reduced, by using 1 .5 buffering, where the DWU is filling 128 bits while the 
remaining 256 bits arc being written to DRAM. While this reduces the required buffering locally it 
increases the peak bandwidth requirement to the DRAM, With 2x buffering the average and peak DRAM 
bandwidth requirement is the same and is 6 bits per cycle, alternatively with 1 .5x buffering the average 
DRAM bandwidth requirement is 6 bits per cycle but the peak bandwidth requirement is 12 bits per cycle. 
The amount of buffering used will depend on the DRAM bandwidth available to the DWU unit. 
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xl.5 Buffering 
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Figure 217. Comparison of 1.5x v 2x buffering 
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Should the DWU fail to get the required DRAM access within the specified time, the DWU will stall the 
DNC data generatior). The DWU will issue the stall in sufficient time for the DNC to respond and still not 
cause a FIFO overrun. Should the stall persist for a sufficiently long time, the PHI will be starved of data 
and be unable to deliver data to the printhead in time. The sizing of the dotline store FIFO and internal 
FIFOs should be chosen so as to prevent such a stall happening. 



30.6 Dotline data in memory 



The dot data shift register order in the printhead is shown in Figure 214 (the transmit order is the opposite 
of the shift register order). In the example the type 0 printhead IC transmit order is increasing even color 
data followed by decreasing odd color data. The type 1 printhead IC transmit order is decreasing odd color 
data followed by increasing even color data. For both printhead ICs the even data is always increasing 
order and odd data is always decreasing. The PHI controls which printhead IC data gets shifted to. 

From this it is beneficial to store even data in increasing order in DRAM and odd data in decreasing order 
While this order suits the example printhead, other printheads exist where it would be beneficial to store 
even data in decreasing order, and odd data in increasing order, hence the order is configurable. The order 
that data is stored in memory is controlled by setting the ColorLineSense register. 
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The dot order in DRAM for increasing and decreasing sense is shown in Figure 218 and Figure 219 
respectively. For each line in the dot store the order is the same (although for odd lines the numbering will 
be different the order will remain the same). Dot data from the DNC is always received in increasing dot 
number order. For increasing sense dot data is bundled into 256-bit words and written in increasing order 
in DRAM, word 0 first, then word 1 , and so on to word N» where N is the number of words in a line. 

For decreasing sense dot data is also bundled into 256-bit words, but is written to DRAM in decreasing 
order, i.e. word N is written first then word N-1 and so on to word 0. For both increasing and decreasing 
sense the data is aligned to bit 0 of a word, i.e. increasing sense always starts at bit 0, decreasing sense 
always finishes at bit 0. 
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Figure 218. Even dot order in ORAM (Increasing Sense. 13320 dot wide line) 



Even Dot Storage in DRAM (Decreasing Sense) 
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Figure 219. Even dot order in DRAM (Decreasing Sense, 13320 dot wide line) 



Each half color is configured independently of any other color. The ColorBaseAdr register specifies the 
position where data for a particular dotline FIFO will begin writing to. Note that for increasing sense col- 
ors the ColorBaseAdr register specifies the address of the first word of first line of the fifo, whereas for 
decreasing sense colors the ColorBaseAdr register specifies the address of last word of the first line of the 
FIFO. 

Dot data received from the DNC is bundled in 256-bit words and transferred to the DRAM, Each line of 
data is stored consecutively in DRAM, with each line separated by ColorLineInc number of words. 

For each line stored in DRAM the DWU increments the line count and calculates the DRAM address for 
the next line to store. 
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This process continues until ColorFifoSize number of lines are stored, after which the DRAM address with 
wrap back to the ColorBaseAdr address. 
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Rgure 220. Dotline FIFO data structure In DRAM 

As each line is written to the FIFO, the DWU increments the FifoFillLevel register, and as the LLU reads a 
line from the F[FO the FifoFillLevel register is decremented. The LLU indicates that it has completed 
reading a line by a high pulse on the llujdwujline^rd line. 

When the number of lines stored in the FIFO is equal to the MaxWriteAhead value the DWU will indicate 
to the DNC that it is no longer able to receive data (i.e. a stall) by deasserting the dwujinc^ready signal. 

The ColorEnable register determines which color planes should be processed, if a plane is turned off, data 
is ignored for that plane and no DRAM accesses for that plane are generated. 
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30.7 Implementation 

30.7.1 Definitions of I/O 



Table 154. DWU I/O Definition 
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Indicates the number of lines in the FIFO before the line 

increment will wrap around in memory. 

Bus 0.1 - Even. Odd line color 0 

Bus 2,3 - Even, Odd line color 1 

Bus 4,5 - Even, Odd line color 2 

Bus 6.7 - Even, Odd line color 3 

Bus 8.9 - Even, Odd line color 4 

Bus 10, 11 - Even. Odd line color 5 


PCU Interface 


pcu_dwu_sel 


1 


In 


Block select from the PCU. When pajLdwt/_soMs high both 
pcu^adrand pcu^dataout are valid. 


pcu_rwn 


1 


In 


Common read/not-write signal from the PCU. 


pcu_adrp:2] 


6 


In 


PCU address bus. Only 6 bits are required to decode the 
address space for this block. 


pcu_dataoutf31 :0} 


32 


In 


Shared write data bus from the PCU. 


dwu_pcu_rdy 


1 


Out 


Ready signal to the PCU. When dwu _pcu_rdy Is high it IrvJi- 
cates the last cycle of the access. For a write cycle this 
means pcujdataout has been registered by the bk>ck and 
for a read cycle this means the data on dwu_pcu^data is 

valid. 


dwu_pcu_data[3 1 :0] 


32 


Out 


Read data bus to the PCU. 


OIU interface 


dwu.dhj_wreq 


1 


Out 


DWU requests DRAM write. A write request must be accom- 
panied by a valid write address together with valid write data 

and a write valid. 


dwu_diu_wadrf21 :5] 


17 


Out 


Write address to DIU 

17 bits wide (256-bit aligned word) 


diu_dwu.wack 


1 


In 


Acknowledge from DIU that write request has been 
accepted and new write address can be placed on 
dwu^diu^wadr 
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Table 1S4. DWU I/O Definition 









dwu_diu_datat63:0] 


64 


Out 


Data from DWU to DIU. 256-bit word transfer over 4 cydes 
First 64-bit3 is bits 63:0 of 256 bit word 
Second 64-bits is bits 1 27:64 of 256 bit word 
Third 64-bits Is bits 1d1 :128 of 256 bit word 
Fourth 64-bit8 is btta 255:1 92 of 256 bit word 


dwu.diujwvalld 


1 


Out 


Signal from DWU indicating that data on dwu xiiujdata is 
valid. 



Doc: SoPEC^hardware^design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 471 



SoPEC : Hardware Design 



30.7,2 DWU partition 
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Figure 221. DWU partition 



30.7.3 Configuration registers 

The configuration registers in the DWU are programmed via the PCU interface. Refer to section 21.8.2 on 
page 257 for a description of the protocol and timing diagrams for reading and writing registers in the 
DWU. Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register 
reads and writes, the lower 2 bits of the PCU address bus are not required to decode the address space for 
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the DWU. When reading a register that is less than 32 bits wide zeros should be returned on the upper 
unused bit(s) of dwu^cujiata. Table 155 lists the configuration registers in the DWU. 



Table 155. DWU registers descHptfon 













Control Registers 


0x00 


Reset 


1 


0x1 


AcHye low synchronous reset, self de«activating. A 
write to this register will cause a DWU block reset. 


0x04 


Qo 


1 


0x0 


Active high bit indicating the DWU Is programmed 
and ready to use. A low to high transition will cause 
[>WU block internal states to reset (configuration 

registers are not reset). 


Dot Line Store Configuration 


0x08 - 0x38 


OotorBaseAdft1 1 :0] 


12x17 


0x00000 


Specifies the base address (in words) in menrK>ry 
where data from a particular half color (N) will be 
placed. 


0x30 - 0x60 


Oo(orRfoSlze[11:0] 


12x8 


0x00 


indicates the number of lines in the FIFO before 
the line increment will wrap around in memory. 
Bus 0 J • Even , Odd line color 0 
Bus 2.3 - Even, Odd line cotor 1 
Bus 4,5 - Even, Odd line color 2, 
Bus 6,7 • Even, Odd Hne color 3 
Bus 8,9 - Even, Odd fine color 4 
Bus 10.1 1 - Even, Odd line color 5 


0x70 


CotorLineSense 


2 


0x2 


Specifies whether data written to DRAM for this 
half color is increasing or decreasing sense 

0 - Decreasing sense 

1 - Increasing sense 

Bit 0 Defines even color sense. 
Bit 1 Defines odd color sense. 


0x74 


ColorEnable 


6 


0x3f 


Indicates whether a particular color is active or not. 
When inactive no data is written to DRAM for that 
color. 

0 - Color off 

1 - Color on 

One bit per color, bit 0 is Color 0 and so on. 


0x78 


MaxWrfteAhead 


8 


0x00 


Specifies the maxinrujm numt>er of lines that the 
DWU can be ahead of the LLU 


0x70 


UneSize 


16 


0x0000 


Indicates the number of dots per line. 


Working Registers 


0x80 


LineDotOnt 


16 


0x0000 


Indicates the number of remaining dots in the cur- 
rent line. (Read Only) 


0x84 


FifoRIILevel 


8 


0x00 


Number of lines in the FIFO, written to but not 
read. (Read Only) 



A low to high transition of the Go register causes the internal states of the DWU to be reset. All configura- 
tion registers will remain the same. The block indicates the transition to other blocks via the dwu_go _j>ulse 
signal. 



The ColorLineInc bus specifies the number of addresses (in 256-bit words) between successive half lines 
in the dot line store. It is derived from the LineSize register by rounding up the nearest 256-bit value. The 
same value used for all half colors. 
i£ (line.sizel7:0] 1=0 ) then 

color_line_inc(7 :01 « line_size(15 : 8] 1 
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else 



color_line_inc[7 :01 ^ line_si2e[15 : 8) ; 



30.7.4 



Fifo fill level 



The DWU keeps a nmning total of the number of lines in the dot store FIFO. Each time the DWU writes a 
line to DRAM (determined by the DIU interface subblock and signalled via line^wr) it increments the 
filllevel and signals the line increment to the LLU (pulse on dwuJtlu_Hne_wr). Conversely if it receives an 
active llu_dwujine_rd pulse from the LLU, the filllevel is decremented. If the fiUlevel increases to the pro- 
grammed max level {max_write_ahead) then the DWU stalls and indicates back to the DNC by de-assert- 
ing the dwu^dnc^ready signal. 

If one or more of the DIU buffers fill» the DIU interface signals the fill level logic via the hufjull signal 
which in turn causes the DWU to de-assert the dwujdnc_ready signal to stall the DNC. The bufjull sig- 
nals will remain active until the DIU services a pending request from the full buffer, reducing the buffer 
level. 

The DWU docs not increment the fill level until a complete line of dot data is in DRAM not just a com- 
plete line received from the DNC. This ensures that the LLU cannot start reading a partial line from 
DRAM.before the DWU has finished writing the line. 

The fill level is reset to zero each time a new page is started, on receiving a pulse via the dwu_go_pulse 
signal. 

The line fifo fill level can be read by the CPU via the PCU at any time by accessing the FifoFillLevel regis- 
ter. 
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30.7.5 Buffer address generator 
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Figure 222. Buffer address generator sub-block 

30, 7.5. 1 Buffer address generator description 

The buffer address generator subblock is responsible for accepting data from the DNC and writing it to the 
DIU buffers in the correct order. 

The buffer address and active bit-write for a particular dot data write is calculated by the buffer address 
generator based on the dot count of the current line, programmed sense of the color and the line size. 

All configuration registers should be programmed while the Go bit is set to zero, once complete the block 
can be enabled by setting the Go bit to one. The transition from zero to one will cause the internal states to 
reset. 

If the color _line_s ens e signal for a color is one (i.e. increasing) then the bit-write generation is straight 
forward as dot data is aligned with a 256-bit boundary. So for the first dot in that color, the bit 0 of the 
wrjbit bus will be active (in buffer word 0), for the second dot bit 1 is active and so on to the 255**^ dot 
where bit 63 is active (in buffer word 3). This is repeated for all 256-bit words until the final word where 
only a partial nimiber of bits are written before the word is transferred to DRAM. 

If colorjlinejsense signal for a color is zero (i.e. decreasing) the bit-write generation for that color is 
adjusted by an offset calculated from the pre-programmed line length (line^size). The offset adjusts the bit 
write to allow the line to finish on a 256-bit boundary. For example if the line length was 400, for the first 
dot received bit 7 (line length is halved because of odd/even lines of color) of the wrjbit is active (buffer 
word 3), the second bit 6 (buffer word 3), to the 200*^ dot of data with bit 0 of wjbit active (buffer word 
0). 
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30.7.5.2 Bit-write decode 



The buifer address generator contains 2 instances of the bit-write decode, one configured for odd dot data 
the other for even. The counter (either up or down counter) used to generate the addresses is selected by 
the color Jlinejsense signal. Each block determines if it is active on this cycle by comparing its configured 
type with the current dot count address and the datajactive signal. 

The wrj}it bus is a direct decoding of the lower 6 count bits (countf6:JJ), and the DIU buffer address is 
the remaining higher bits of the counter (count f J 0:7J). 

The signal generation is given as follows: 
// determine the counter to use 
if (color_line_sense == 1 ) 

count = up_cnt(10:0] 
else 

count = dn_cnt(10:0) 
// determine if active, based on instance type 

wr_en - data_active & (count (0] odd_even_type) // odd =1, even sQ 

// determine the bit write value 

wr_bit[63:0] = decode (count [6 : 1] ) 

// determine the buffer 64-bit address 

wr_adr[3:0] = count (10:71 



30, 7.5.3 Up counter generator 

The up counter increments for each new dot and is used to determine the write position of the dot in the 
DIU buffers for increasing sense data. At the end of each line of dot data (as indicated by line Jin), the 
counter is rounded up to the nearest 256-bit word boundary. This causes the DIU buffers to be flushed to 
DRAM including any partially filled 256-bit words. The counter is reset to zero if the dvm^go _pulse is 
one. 

// Up-Counter Logic 

if <dwu_go_pulse == 1) then { 

up_cnt(10:0] = 0 
elsif (line_fin == 1 ) then 

/ / round up 

if (up_cnt(8:ll != 0) 
up_cnt [10: 9I++ 

else 

up_cnt[lO:9) 

// bit-selector 

up_cnt n ; 0] =0 

elsif ( {dnc_dwu_avail == 1) and (dwu_dnc_ready 1 ) ) then 
up_cnt (7:01++ 



30.7,5.4 Down counter generator 

The down counter logic decrements for each new dot and is used to determine the write position of the dot 
in the DUI buffers for decreasing sense data. When the dwu^go _pulse bit is one the lower bits (i.e. 8 to 0) 
of the counter are reset to line size value Qinejsize), and the higher bits to zero. The bits used to determine 
the bit-write values and 64-bit word addresses in the DIU buffers begin at line size and count down to zero. 
The remaining higher bits are used to determine the DIU buffer 256-bit address and buffer fill level, begin 
at zero and count up. The counter is active when valid dot data is present, i.e. dncjdwujavail equals 1 . 

When the end of line is detected {line Jin equals 1) the counter is rounded to the next 256-bit word, and the 

lower bits are reset to the line size value. 

//Down-Counter Logic 

if {dwu_go_pulse == 1) then 
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dn_cnt(e:0] b line.size [8 :0) 
dn_cnt(10:9) « 0 
els if <line_fin == X ) then 
// perform rounding up 
if (dn_cntta:l) != 0) 

dn_cnt[10:9)*-fr 
else 

dr>.cntC10:9] 
// bic-select is reset 

dn_cnt[8:0]aline_8lzet8:03 // bit select bits 
elsif ( (dnc_dwu_avftil ssss i) AMD (dwu.dnc_ready == 1 ) ) then 
dn_cntl8:0] — 
dn_cnt(10:9]+* 



The dot counter simply counts each active dot received from the DNC. It sets the counter to linejsize and 
decrements each time a valid dot is received. When the count equals zero the line Jin signal is pulsed and 
the counter is reset to linejsize. 

The counter is reset to linejsize when dwu^o _j>ulse is 1. 



The DIU buffer is a 64 bit x 8 word dual port register array with bit write capability. The buffer could be 
implemented with flip-flops should it prove more efficient. 



30.7.5.5 Dot counter 



30.7.6 



DIU buffer 
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30.7.7 DIU interface 
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Figure 223. OiU Interface sub-block 



30. 7. 7. 1 DIU interface general description 

The DIU interface determines when a buffer needs a data word to be transferred to DRAM, It generates the 
DRAM address based on the dot line position, the color base address and the other programmed parame- 
ters. A write request is made to DRAM and when acknowledged a 256-bit data word is transferred. The 
interface determines if further words need to be transferred and repeats the transfer process. 

If the FIFO in DRAM has reached its maximum levels or one of the buffers has temporarily filled, the 
DWU will stall data generation from the DNC. 

A similar process is repeated for each line until the end of page is reached. At the end of a page the CPU is 
required to reset the internal state of the block before the next page can be printed. A low to high transition 
of the Go register will cause the internal block reset, which causes all registers in the block to reset with 
the exception of the configuration registers. The transition is indicated to subblocks by a pulse on 
dwu_go^jniis€ siffnal. 
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30.7.7.2 Interface controUer 
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Figure 224. interface controller state diagram 

The interface controller state machine waits in Idle state until an active request is indicated by the read 
pointer (via the req_active signal). When an active request is received the machine proceeds to the Col- 
orSelect state to determine which buffers need a data transfer. In the ColorSelect state it cycles through 
each color and determines if the color is enabled (and consequently the buffer needs servicing), if enabled 
it jumps to the Request state, otherwise the color jcnt is incremented and the next color is checked. 

In the Request state the machine issues a write request to the DIU and waits in the Request state until the 
write request is acknowledged by the DIU {diu_dy/vu_wacky Once an acknowledge is received the state 
machine clocks through 4 cycles transferring 64-bit data words each cycle and incrementing the corre- 
sponding buffer read address. After transferring the data to the DIU the machine returns to the ColorSelect 
state to determine if further buffers need servicing. On the transition the controller indicates to the address 
generator {adr_update) to update the address for that selected color. 

If all colors are transferred {color^cnt equal to 6) the state machine returns to Idle, updating the last word 
flags {group Jin) and request logic (reqjupdate). 

The dwu_diu_wvalid signal is a delayed version of the huf_rd_en signal to allow for pipeline delays 
between data leaving the buffer and being clocked through to the DIU block. 

The state machine will return from any state to Idle if the reset or the dwu _go_jmlse is 1 . 
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30.7.7.3 Address generator 

The address generator block maintains 12 pointers (color_adr[l 1:0]) to DRAM corresponding to current 
write address in the dot line store for each half color. When a DRAM transfer occurs the address pointer is 
used first and then updated for the next transfer for that color. The pointer used is selected by the req^sel 
bus, and the pointer update is initiated by the adrjupdate signal from the interface controller. 

The pointer update is dependent on the sense of the color of that pointer, the pointer position in a line and 
the line position in the FIFO. The programming of the colorjbase^adr needs to be adjusted depending of 
the sense of the colors. For increasing sense colors the colorJbase_adr specifies the address of the first 
word of first line of the fifo, whereas for decreasing sense colors the colorjbase^adr specifies the address 
of last word of the first line of the FIFO. 

For increasing colors, the initialization value (i.e. when ^/ww_go _pulse is 1) is the colorJbase_adn For 
each word that is written to DRAM the pointer in incremented. If the word is the last word in a line (as 
indicated by last^wd firom that read pointers) the pointer is also incremented. If the wonl is the last word in 
a line, and the line is the last line in the FIFO (indicated by fifo^end from the line counter) the pointer is 
reset to color Jbase^adr. 

In the case of decreasing sense colors, the initialization value (i.e. when dwu _go ^ulse is 1) is the 
color_base_adK For each line of decreasing sense color data the pointer starts at the line end and decre- 
ments to the line start. For each word that is written to DRAM the pointer is decremented. If the word is 
the last word in a line the pointer is incremented by colorjinejnc * 2 + 1. One line length to account for 
the line of data just written, and another line length for the next line to be written. If the word is the last 
word in a line, and the line is the last line in the FIFO the pointer is reset to the initialization value (i.e. 
color Jbase_adr) , 

The address is calculated as follows: 

if (ciwu^go^ulse =« 1) then 

color_adr[ll:0) s color_base_adr [11 :0] 121 : 5] 
elsif (adrjupdate == 1) then ( 

// determine the color 

color » reQ_sel(3:01 

// line end and fifo wrap 

if ( { f ifo_end[color} == 1) AND (last_wd == 1)) then { 
// line end and fifo wrap 

color_odr [color] = color^base^adr [color] [21 : 51 
) 

elsif ( last_wd == 1) then ( 

// just a line end no fifo wrap 

if (color_line_sense [color % 2) 1) then // increasing sense 
color_adr [color] ■«■ + 

// decreasing sense 
color^adr CcolorJ « color_adr [color) ^ ( color_line inc * 2) + 1 

> 

else ( 

// regular word write 

if (color_line_sen3e [color % 2] ==1) then // increasing sense 

color^adr [color] ++ 
else // decreasing sense 

color_adr (color) 

) 

) 

// select the correct address « for this transfer 
dwu_diu_wadr = color_adr [req_sei) 
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30.7,7.4 Line count 

The line counter logic counts the number of dot data lines stored in DRAM for each color. A separate 
pointer is maintained for each color. A line pointer is updated each time the final woixl of a line is trans- 
feiTcd to DRAM. This is determined by a combination of adrjupdate and lastji^d signals. The pointer to 
update is indicated by the reqjsel bus. 

When an update occurs to a pointer it is compared to zero, if it is non-zero the count is decremented, oth- 
erwise the counter is reset to color Jifo^size. If a counter is zero the fifojand signals is set high to indicates 
to the address generator block that the line is the last line of this colors fifo. 

If the dwu^o^ulse signal is one the counters are reset to color Jifo_size. 

if (dwu^o_pulse ==r 1) then 

line^cnt (11 :01 » color.f if o_si2e ( 11 : 0) 
elsif ((adr_updatd == 1) AND (last_wd i) ) then { 

// determine the pointer to operate on 

color req_sel(3;0) 

// update the pointer 

if (line^cnt [color) Q) then 

line^cnt (colorj = color^f if o_size (color) 

else 

line_cnt(i) 

} 

// count is zero its the last line of fifo 
for<i=0 ;i <12;i*+) ( 

fifo_end(i] = (line_cnt(i] == 0) 

} 

30.7.7-5 Read Pointer 

The read pointer logic maintains the buffer read address pointers. The read pointer is used to determine 
which 64-bit words to read from the buffer for transfer to DRAM. 

The read pointer logic compares the read and write pointers of each DIU buffer to determine which buffers 
require data to be transferred to DRAM (pendflL OJ bus), and which buffers are full (the bufjull signal). 
Only enabled buffers are considered as indicated by the colorjsnable bus. 

Buffers are grouped into odd and even buffers groups. If an odd buffer requires DRAM access the 
oddjpend signals will be active, if an even buffer requires DRAM access the even _pend signals will be 
active. If both odd and even buffers require DRAM access, the even buffers will get serviced first 

If any buffer requires a DRAM transfer, the logic will indicate to the interface controller via the reqjactive 
signal, with the oddjeven^el signal determining which group of buffers get serviced. The interface con- 
troller will check the color^enable signal and issue DRAM transfers for all enabled colors in a group. 
When the transfers are complete it tells the read pointer logic to update the requests pending via 
reqjupdate signal. 

The reqjsel[3:0] signal tells the address generator which buffer is being serviced, it is constructed from 
the odd_even_sel signal and the color _cnt [2:0] bus from the interface controller. When data is being trans- 
ferred to DRAM the word pointer and read pointer for the corresponding buffer are updated. The req_sel 
determines which pointer should be incremented. 
// determine which buffers need updates 
for( i=0; i<12; i*+) ( 

// deteannine if. request is active, filtered by color enable 

if { wr_adrCi) (3:2J != rd.adr (i J (3 :2] ) 
pend(il = color_enable[ i / 2) 

else 

pend(i] = 0 
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// determine if eny enabled buffer is full 

if ({wr_adr[i3 [3;0] - rd_adr [i) (3: 0] > > 7) AND (color_enable( i / 21 «s i)) then 
buf_full = 1 

> 

// Odd half colors (1,3,5,7,9,11), even half colors (0,2,4,6,8,10) 
odcUpend = ( pend[l] | pendC3) | pendCS) | pendC7] | pend[9J | pend(ll) ) 
even_pend = < pendCO) | pend(2] | pend[4] | pend[6] j pend(8) | pendtlOJ ) 
// fixed servicing order, only update when controller dictates so 
if (req_update == 1) then ( 

if (even-pend == 1) then // even always first 

odd^even_8el = 0 
req_active = 1 
els if (oddjend == 1 ) then // then check odd 
odd^even^sel = 0 
reo^active « 1 
else // nothing active 

odd_even_sel = 0 
req_active = 0 

) 

// selected recpiestor 

recL.sel(3:01 ^ {color_cnt [2 :0) , odd«even_sel > // concatentation 

The read address pointer logic consists of 12 2-bit counters and a word select pointer. The pointers are 
reset when dwu^ojpulse is one. The word pointer {word_ptr) is common to all buffers and is used to read 
out the 64-bit words from the DIU buffer. It is incremented when buf_rdjsn is active. If the word^tr is 3 
and ikifi.huf_rd_en \& active the selected read pointer (rd_ptr[req_selj) will be incremented A concatena- 
tion of the read pointer and the word pointer are use to construct the buffer read address. The read pointers 
are not reset at the end of each line. 
// deterroine which pointer to update 
if {dwu_go_pulse 1) then 

rd_ptrUl:01 = 0 

word_ptr = 0 

els if {buf_rd„en == 1) then { 

word_ptr++ 

if {word_ptr == 3 ) then 
rdLptr [ req„sel ) •»-+ 

) 

// create the address from the pointer, and word reader 

rd_adr ( req_sel 3 = (rd_ptr Creq^sell ,word_ptr} // concatenation 

The read pointer block determines if the word being read from the DIU buffers is the last word of a line. 
The buffer address generator indicate the last dot is being written into the buffers via the line Jin signal. 
When received the logic marks the 256-bit word in the buffers as the last word. When the last word is read 
from the DIU buffer and transferred to DRAM, the flag for that word is reflected to the address generator. 

// line end set the flags 
if (dwu_go_pulse == 1) then 

last_flag[l :0] Cl.-Ol = 0 
elsif <line_fin == 1 ) then 

//determines the current 256-bit word even been written to 

last_flag[0) (wr^adrlOJ C2] 1 =1 // even group flag 

// determines the current 256-bit word odd been written to 

last_flagUHwr_adr(lJ (2] ] =1 // odd group flag 
// last word reflection to address generator 
last_wd = last_f lagCodd_even_selI [rd_ptr [req_sel) (0] J 
// clear the flag 
if (group_fin 1 ) then 

last_Clag(odd_even_selJ [rd_ptrCreq_sell [0] ] a 0 

When a complete line has been written into the DIU buffers (but has not yet been transferred to DRAM), 
the buffer address generator block will pulse the linejin signal. The DWU must wait until all enabled 
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buffers are transferred to DRAM before signaling the LLU that a complete line is available in the dot line 

store {dwujlujine^wr signal). When the line Jin is received all buffers will require transfer to DRAM. 

Due to the arbitration, the even group will get seiviced first then the odd. As a result the Ime finish pulse to 

the LLU is generated from the lastjiag of the odd group. 

// must be odd, odd group transfer complete and the last word 

dwu_llu_line_wr 3 odd«even_sel AND groups fin AND last^wd 
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31 

31.1 



Line Loader Unit (LLU) 



Overview 



The Line Loader Unit (LLU) reads dot data from the line buffers in DRAM and structures the data into 
even and odd dot channels destined for the same print time. The blocks of dot data are transferred to the 
PHI and then to the printhead Figure 225 shows a high level data flow diagram of the LLU in context. 



DWU 



dol data 



ORAM 
viaDIU 



dot data 



LLU 



control 



dot data 



control 



PHI 



Rgure 225. High level data flow diagram of LLU In context 



31 .2 Physical requirement imposed by the pr/nthead 

The DWU re-orders dot data into 12 separate dot data line FIFOs in the DRAM. Each FIFO corresponds to 
6 colors of odd and even data. The LLU reads the dot data line FIFOs and sends the data to the printhead 
interface. The LLU decides when data should be read from the dot data line FIFOs to correspond with the 
time that the particular nozzle on the printhead is passing the current line. The interaction of the DWU and 
LLU with the dot line FIFOs compensates for the physical spread of nozzles firing over several lines at 
once. For further explanation see Section 30 Dotline Writer Unit (DWU) and Section 32 PrintHead Inter- 
face (PHI). Figure 226 shows the physical relationship of nozzle rows and the line time the LLU starts 
reading from the dot line store. 
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Figure 226. Paper and printhead nozzles relationship (example with D<|SD2=5) 
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Within each line of dot data the LLU is required to generate an even and odd dot data stream to the PHI 
block. Figure 227 shows the even and dot streams as they would map to an example bi-lithic printhead. 
The PHI block determines which stream should be directed to which printhead IC. 
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Nona: Paper passing under printhead 
Figure 227. Printhead structure and dot generate order 
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31 .3 Dot generate and transmit order 

The structure of the printhead ICs dictate the dot transmit order to each printhead IC. The LLU reads data 
from the dot line FIFO, generates an even and odd dot stream which is then re-ordered (in the PHI) into the 
transmit order for transfer to the printhead. 

The DWU separates dot data into even and odd half Unes for each color and stores them in DRAM. It can 
store odd or even dot data in increasing or decreasing order in DRAM. The order is programmable but for 
descriptive purposes assume even in increasing order and odd in decreasing order. TTie dot order structure 
in DRAM is shown in Figure 219. 

The LLU contains 2 dot generator units. Each dot generator reads dot data from DRAM and generates a 
stream of odd or even dots. The dot order may be increasing or decreasing depending on how the DWU 
was programmed to write data to DRAM. An example of the even and odd dot data streams to DRAM is 
shown in Figure 228. In the example the odd dot generator is configured to produce odd dot data in 
decreasing order and the even dot generator produces dot data in increasing order. 

The PHI block accepts the even and odd dot data streams and reconstructs the streams into transmit order 
to the printhead. 

The LLU line size refers to the page width in dots and not necessarily the printhead width. The page width 
is often the dot margin number of dots less than the printhead width. They can be the same size for full 
bleed printing. 
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Figure 228. Dot data generated and transmitted order 



31.4 LLU START-UP 

- At the start of a page the LLU must wait for the dot line store in DRAM to fill to a configured level (given 
by FifoReadThreshold) before starting to read dot data. Once the LLU starts processing dot data for a page 
it must continue until the end of a page, the DWU (and other PEP blocks in the pipeline) must ensure there 
IS alv/ays data in the dot line store for the LLU to read, otherwise the LLU will stall, causing the PHI to 
stall and potentially generate a print error. The FifoReadThreshold should be chosen to allow for data rate 
mismatches between the DWU write side and the LLU read side of the dot line FIFO. The LLU will not 
generate any dot data until FifoReadThreshold level in the dot line FIFO is reached. 

Once the FifoReadThreshold is reached the LLU begins page processing, the FifoReadThreshold is 
ignored from then on. 

When the LLU begins page processing it produces dot data for all colors (although some dot data color 
may be null data). The LLU compares the line count of the current page, when the line count exceeds the 
ColorRelLine configured value for a particular color the LLU will start reading from that colors FIFO in 
DRAM. For colors that have not exceeded the ColorRelLine value the LLU will generate null data (zero 
data) ^d not read from DRAM for that color. ColorRelLine [NJ specifies the number of lines separating 
the Nr half color and the first half color to print on that page. 

For the example printhead shown in Figure 226, color 0 odd will start at line 0, the remaining colors will 
all have null data. Color 0 odd will continue with real data until line 5, when color 0 odd and even will 
contain real data the remaining colors will contain null data. At line 10, color 0 odd and even and color 1 
odd will contain real data, with remaining colors containing null data. Every 5 lines a new half color will 
contain real data and the remaining half colors null data until line 55. when all colors will contain real 
data. In the example ColorRelLine [0] -5, ColorRelLine [1 J =0, ColorRelLine [2] =-15. ColorRelLine [3] 
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I It is possible to tum off any one of the color planes of data (via the ColorEnable register), in such cases the 

LLU wiU generate zeroed dot data information to the PHI as normal but will not read data from the 
DRAM. 



31.4.1 LLU bandwidth requirements 

The LLU is required to generate data for feeding to the printhcad interface, the rate required is dependent 
on the printhead construction and on the line rate configured. The maximum data rate the LLU can pro- 
duce is 1 2 bits of dot data per cycle, but the PHI consumes at 1 2 bits per phiclk cycle (2/3 pdk rate), i.e. 8 
bits ptrpclk cycle. Therefore the DRAM bandwidth requirement for a double buffered LLU is 8 bits per 
cycle on average. If 1.5 buffering is used dien the peak bandwidth requirement is doubled to 16 bits per 
cycle but the average remains at 8 bits per cycle. Note that while the LLU and PHI could produce data at 
the 8 bits per cycle rate, the DWU can only produce data at 6 bits per cycle rate. 
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31.5 Implementation 



31.5.1 LLU partition 
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Figure 229. LLU partition 



31 .5.2 Definitions of I/O 



Table 156. LLU I/O definition 













Clocks and Resets 


pdk 


1 


In 


System dock 


prst_n 


1 


In 


System reset, synchronous active low 


PHI Interlace 
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Table 1S6. LLU I/O definition 





t yyr^.* — j 








lluj)hLdata[1:0I[5:0] 


2x6 


Out 


Dot Data from IXU to the PHI. each bit Is a color plane 5 downto 0. 
Bus 0 - Even dot data stream 
Bus 1 - Odd dot data stream 

Data is active when corresponding bit is active In ttu_phLavaff bus 


phiJIu.readyllK)] 


2 


In 


Indicates that PHI is ready to accept data from the U.U 

0 • Even dot data stream 

1 • Odd dot data stream 


Wu_phLavailI1:0] 


2 


Out 


Indicates valid data present on corresponding Uu^U^ta. 

0 - Even dot data stream 

1 - Odd dot data stream 


DIU rntertace 


Hu_diu_rreq 


1 


Out 


LLU requests DRAM read. A read request must be accompanied 

by a valid read address. 


Hu.diu_radfC21:5] 


17 


Out 


Read address to OIL) 

17 bits wide (256-btt aligned word). 


dlujlu^rack 


1 


In 


Acknowledge from 01 U that read request has been accepted and 
new read address can be placed on ilu^diu^radr 


diu_data[63:0I 


64 


In 


Data from DIU to LLU. Each access is 256-bits received over 4 
ctock cydes 

Rrst 64-bits is bits 63:0 of 256 bit word 
Second 64-bits is bits 127:64 of 256 bit word 
Third 64-bits is bits 191:128 of 256 bit word 
Fourth 64-bits is bits 255:192 of 256 bit word 


diu_nu_rvalid 




In 


Signal from DIU telling LLU that valid read data is on the diujdata 

tXJS 


DWU Interface 


dwu_ltujine_wr 




in 


DWU line write. Indicates that the DWU has completed a full line 
write. Active high 


IJu_dwuJine_rd 




Out 


LLU line read. Indicates that the LLU has completed a line read. 
Active high. 


dwu Jlu_cfifosize[1 1 :0]pr:0) 


12x8 


(n 


Indicates the number of lines in the FIFO before the line increment 
will wrap around in memory. 


PCU Interface 


pcujlu^sel 




In 


Block select from the PCU. When pcu_fiu_S0f \s high both pcv_adr 
and pcu^dataout are valid. 


pcu_rwn 




In 


Common read/not-write signal from the PCU. 


pcu_adrp:2] 


6 


In 


PCU address bus. Only 6 bits are required to decode the address 
space for this block. 


pcu_dataout(3 1 :0] 


32 


In 


Shared wnte data bus from the PCU. 


llu _pcu_rdy 


1 


Out 


Ready signal to the PCU. When llu _pcv_rdy \s high it indicates the 
last cycle of the access. For a write cyde this means pcu^dataout 
has t>een registered by the block and for a read cycle this means 
the data on liu_pcu_data is valid. 


IIu_pcu_dalal31 :0) 


32 


Out 


Read data bus to the PCU. 



31.5.3 Configuration registers 

The configuration registers in the LLU are programmed via the PCU interface. Refer to section 21.8.2 on 
page 2S7 for a description of the protocol and timing diagrams for reading and writing registers in the 
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LLU. Note that since addresses in SoPEC are byte aligned and the PCU only supports 32-bit register reads 
and writes, the lower 2 bits of the PCU address bus are not required to decode the address space for the 
LLU. When reading a register that is less than 32 bits wide zeros should be returned on the upper unused 
bit(s) of llu^pcu^data. Table 157 lists the configuration registers in the LLU. 



Table 157. LLU registers description 









Control Reglst 


ars 


0x00 


Reset 


1 


0x1 


Active low synchronous reset, self de-activating. A 
write to this register wiD cause a LLU block reset 


0x04 


Go 


1 


0x0 


Active high bit indicating the LLU is programmed and 
ready to use. A tow to high transition will cause LLU 
block internal states to reset 


Configuration 




0x08-0x38 


ColoreaseAdrflliO] 


12x17 


0x0000 
0 


Specifies the base address (in words) in memory 
where data from a particular half color (N) mil be 
placed. 


0x30 


ColorEnabld 


6 


0x3F 


Indicates whether a particular color is active or not 
When inactive no data is written to DRAM for that 
color. 

0 - Color off 

1 - Color on 

One bit per color* bit 0 is Color 0 and so on. 


0x40 


UneSize 


16 


0x0000 


Indicates the number of dots per line. 


0x44 


RfoReadTh reshold 


8 


0x00 


Specifies the number of lines that should t>e in the 
FIFO before the LLU starts reading. 


0x48 - 0x78 


ColorRelUne[1'l:0] 


12x8 


0x00 


Specifies the relative number of lines to wait from the 
first before starting to read dot data from the corre- 
sponding dot data FIFO 
Bus 0,1 - Even, Odd line color 0 
Bus 2,3 - Even, Odd line color 1 
Bus 4,5 - Even, Odd line color 2 
Bus 6.7 - Even. Odd line color 3 
Bus 8,9 - Even, Odd line cofor 4 
Bus 10,1 1 - Even, Odd line color 5 


Working Regist 




0x7C 


RfbRIILevel 


8 


0x00 


Number of lines in the dot line FIFO, line written in but 
not read out. (Read Only) 



A low to high transition of the Co register causes the interna] states of the LLU to be reset All configura- 
tion registers will remain the same. The block indicates the transition to other blocks via the llu^o^lse 
signal. 



The CoiorLinelnc bus specifies the number of addresses (in 256-bit words) between successive half lines 
in the dot line store, is used to determine when a half line of data is read from DRAM. It is derived from 
the UneSize register by rounding up the nearest 256-bit value. The same value used for all half colors, 
if (line_sizeC7:03 !=0 ) then 

colo r„l incline [7 :0} = line.size(l5 : 8] ♦ i 
else 

color_line_inct7 :0J = line_ai%e[lS : 8) 
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31.5.4 Dot generator 




docavail 



fine.siza 



^ — ^ dot_data 



Figure 230. Dot generator RTL Diagram 



The dot generator block is responsible for reading dot data from the DIU buffers and sending the dot data 
in the correct order to the PHI block. The dot generator waits for Hubert signal from the fifo fill level block, 
once active it starts reading data from the 6 DIU buffers and generating dot data for feeding to the PHI. 

In the LLU there are two instances of the dot generator, one generating odd data and the other generating 
even data. 

At any time the ready bit from the PHI could be de-asserted, if this happens the dot generator will stop 
generating data, and wait for the ready bit to be re-asserted. 



31,5,4.1 Dot count 

In normal operation the dot counter will wait for the llu_en and the ready to be active before starting to 
count. The dot count will produce data as long as the phijlu^ready is active. If the phijlu^ready signal 
goes low the count will be stalled. 

The dot counter increments for each dot that is processed per line. It is used to determine the line finish 
position, and the bit select value for reading from the DIU buffers. The counter is reset after each line is 
processed (line^n signal). It determines when a line is finished by comparing the dot count with the con- 
figured line size divided by 2 (note that odd numbers of dots will be rounded down). 

// define the line £iniah 
I if (dot_cnt(14 :01 == line_size [1 5 : 1 ] ) Chen 

line^fin = 1 
else 

line_fin = 0 
// decermine if word is vol id 

dot_active = ( (llu_en == 1) AND (phi_llu_ready =^= 1) AND (buf_en^ == 0)) 
// counter logic 
if (llu_go_pulse == 1) then 
dot_cnt = 0 

elsif ( (dot_active == 1)and (line_fin == I)) then 

dot^cnt n 0 
elsif (detractive == 1) then 

dot_cnt = dot_cnt ♦ 1 
else 

dot_cnt = dot_cnt 
// calculate the word select bits 
bit_selI5:01 := dot_cntt5:0J 
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The dot genwator also maintains a read bufiFer pointer which is incremented each time a 64-bit word is 
processed. The pomter is used to address the correct 64-bit dot data word within the DIU buffers The 
pomter IS reset when llu^o^lse is 1. Unlike the dot counter the read pointer is not reset each line but 
rounded up the nearest 256-bit word. This aUows for more efficient use of the DIU buffets at line finish. 
// read pointer logic 
if (Ilu_go_pulse == 1) then 
road^adr = 0 

elsif (( dot^active == 1) AND (dot_cnc[5:01 « 63 ) ) then 

read^adr // normal increment 

elsif (( dot.active == 1) AND (line_fin 1 ) ) then ( 
// special end of line case 
if (dot_cnt(7:0) 1= 0) then 

read_adr{3:2) // end of line round up 

read_adr {1 :0J = 0; 

> 



31.5,5 Fifofill level 



The LLU keeps a miming total of the number of lines in the dot line store FIFO. Every time the DWU sig- 
nals a line end (dwu Jiu line^wr active pulse) it increments th^filllevei Conversely if the LLU detects a 
me end (Une_rd pulse) the Jilllevel is decremented and the line read is signalled to the DWU via the 
liu_awujine^rd signal. 

The LLU fill level block is used to determine when the dot line has enough data stored before the LLU 
should beg^to start reading The LLU at page start is disabled It waits for the DWU to write lines to the 
dot hne FIFO, and for the fill level to increase. The LLU remains disabled until the fill level has reached 
the programmed threshold (Jifo.read^thres), When the threshold is reached it signals the LLU to start pro- 

^^^f^!^nF ^ f '^"-r ^^^^^'^^^ ^ processing dot data for a page it wHl not 

stop It the ftlllevei falls below the threshold, - 

The line fifo fill level can be read by the CPU via the PCU at any time by accessing the FifoFillLevel regis- 

A^^?I. "^T ^^'^ ^^^^ ^« <''>^^^y initialized at page start 

and the fifo level reset to zero. *^ ^uui 

if <llu_go_pulse 1) then 
fillleveX = 0 

elsif (dine^rd i> AND (dwu_llu_line_wr ==1)) then 

//do nothing 
elsif (line_rd == l) then 

filllevel 
elsif (dwu^liu_line_wr == 1) then 

filllevel ++ 

// determine the threshold, and set the LLU going 
if <llu_goj»ulse =« 1) then 
llu_en = 0 

elsif (filllevel == f ifo_read_threahold ) then 
llu_en = 1 
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31.5.6 DIU interface 
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Figure 231. DIU interface 



3f . 5. 5. f D/(y interface description 



The DIU interface block is responsible for determining when dot data needs to be read from DRAM, keep- 
ing the dot generators supplied with data and calculating the DRAM read address based on configured 
parameters, FIFO fill levels and position in a line. 

The fill level block enables DIU requests by activating Humeri signal. The DIU interface controller then 
issues requests to the DIU for the LLU buffers to be filled with dot line data (or fill the LLU buffers with 
null data without requesting DRAM access, if required). 

At page start the DIU interface determines which buffers should be filled with null data and which should 
request DRAM access. New requests are issued until the dot line is completely read from DRAM. 

For each request to the DRAM the address generator calculates where in the DRAM the dot data should be 
read from. The color_enable bus determines which colors are enabled, the interface never issues DRAM 
requests for disabled colors. 
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31.5.6.2 Interface controller 



CQfor ftnablfifccior cnti— o am 

m\^p - 



COjor startfnolQf cntl^AND 



cotor ent < fi 

cotorjcnt 




flCMtOBHu no ntit^ft 

Idle J 



req_updat0» t 



fM antlvf^-^i AND ru >n— i 



cotw_cnt-o 



WCotorSelect 



IS CQtor cnaael 

J I CQ'Q'' cn! ^ ft 

/ color_cn!4-t> 



Min i, Miinnifnnnw x iii i" 



AND ctf nf fmt ft 



^ Request ^ 



Hu_diu_rreq *1 



tfiU ffU fBflK«»t 



DataO ^ j;riio_on=i 



.nun « -ootof_stan(coiorjcm] 



-cotor_staritcotor_cmj 



Data2 ) r;;; 



write_en =» 1 

nui - -coior_8tarttcotor.cmi 



QR rftn 



Data3 ^ 



write.en - i 

wr_nuU - -color_8taft{colof_cnij 



Machine remains in same state by defeiult 
All outputs are zero unless othenwise stated 
State Description: 

Idle : Idle state wait for active request 

ColorSelect Select the color to update before 
requesting to DIU 

Request Request Issued wait for acknowledge 

DataO: Data word 0 transfer 

Datal: Data word 1 transfer 

Data2: Data word 2 transfer 

Oata3: Data word 3 transfer 



Figure 232. Interface controller state diagram 

The interface controller co-ordinates and issues requests for data transfers from DRAM. The state machine 
waits m Idle state until it is enabled by the LLU controller {llu^en) and a request for data transfer is 
received from the wnte pointer block. 

When an active request is received {req^active equals 1) the state machine jumps to the ColorSelect state 
to detemune which colors {color^cnt) in the group need a data tiansfer. A group is defined as all odd col- 
ors or all even colors. If the color isn't enabled {color^enable) the count just increments, and no data is 
transferred. If the color is enabled, the state machine takes one of two options, either a null data transfer or 
an actual data transfer from DRAM. A null data transfer writes zero data to the DIU buffer and does not 
issue a request to DRAM. 

The state machine determines if a null transfer is required by checking the color^start signal for that color. 
If a nuJI transfer is required the state machine doesn't need to issue a request to the DIU and so jumps 
directly to the data transfer states {DataO to DataS), The machine clocks through the 4 states eacEe 
wn^g a null 64-bit data word to the buffer. Once complete the state machine ^ to the ColorsTct 
state to determine if ftirther transfeis are required. •"-^♦ec. 

If Ae color_start is active then a data transfer is required! The state machine jumps to the Aefluerr state 
"'sue a request to the DfU controller for DRAM access by setting Uu_diu _rreq high S5 
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responds by acknowledging the request (diujlu^rack equals 1) and then sending 4 64-bit words of data. 
The transition from Request to DataO state signals the address generator to update the address pointer 
(adrjupdate). The state machine clocks through DataO to Data3 states each time writing the 64-bit data 
into the buffer selected by the req^sel bus. Once complete the state machine returns to the ColorSeiect 
state to determine if further transfers are required. 

When in the ColorSeiect state and all data transfers for colors in that group have been serviced (i.e. when 
color jcnt is 6) the state machine will return to the Idle state. On transition it will update the word counter 
logic (word_dec) and enabled the request logic {reqjupdate). 

A reset or llu_go^ulse set to 1 will cause the state machine to jump directly to Idle. The controller will 
remain in Idle state until it is enabled by the LLU controller via the llujen signal. This prevents the DIU 
attempting the fill the DIU buffers before the dot line store FIFO has filled over its threshold level. 

3t.5.6.3 Color activate 

The color activate logic maintains an absolute line count indicating the line niunber currently being pro- 
cessed by the LLU. The counter is reset when the llu^o^ulse is I and incremented each time a line^rd 
pulse is received. The count value (line^cnt) is used to detemiine when to start reading data for a color. 

The count is implemented as follows: 

if < llu_go_pulse == 1> th«n 

line_cnt = 0 
elsif ( line_rd == 1) then 

line_cnt 

The color activate logic compares line count with the relative line value to determine when the LLU 
should start reading data from DRAM for a particular half color. It signals the interface controller block 
which colors arc active for this dot line in a page (via the color^start bus). It is used by the interface con- 
troller to determine which DIU buffers require null data. 

Once the color_start bit for a color is set it caimot be cleared in the normal page processing process. The 
bits must be reset by the CPU at the end of a page by transitioning the Co bit and causing a pulse on the 
llu_go _pulse signal. 

Any color not enabled by the color^enable bus will never have its color_start bit set. 

for <i=0; i<12;i++){ 

if ( llu_go_pulse == 1) then 

col_on[ij a 0 
elsif ( color_enable(i % 6) == 1 ) then 

col_on(i) « 0 
elsif ( line_cnt color_rel_lxne [ i] ) then 

col_on[i] = 1 

> 

// select either odd or even colors 

if ( odd_even_sel == 1 ) then // odd selected 

color_3tart (5:01 = {col_on 111), col_on 1 9 ) , col.on [7 J , col.on ( 5 ) , col.on C 3 ] , col_on [ 1 ) } 
else // even selected 

color_startt5:03 = (col_on(10) ,col_on(81 , col_on[6] , col_on(4 ) . col_on [2 J . col_on(OI } 



31.5.6.4 Address generator 

The address generator block maintains 12 pointers {color_adr[J } :0]) to DRAM corresponding to current 
read address in the dot line store for each half color. When a DRAM transfer occurs the address pointer is 
used first and then updated for the next transfer for the color. The pointer used is selected by the reqjsel 
bus, and the pointer update is initiated by the adrjupdate signal from the interface controller. 
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The pointer update and pointer initialization is dependent on the pointer position in a line and the line posi- 
tion in the FIFO. 

When a ilu__go^ulse is received the pointers arc each initialized to the corresponding base address for that 
color (color Jbasejadr), For each word that is read from DRAM the pointer is incremented If the word is 
the last word in a line (last^wd equals 1) and the last line in the fifo (fifajsnd equals 1) then the address 
pointer is re-initialized to the base address value. The pointer is incremented for all other words. 

The address is calculated as follows: 
// reset to base address 
if (llu_go_pulse == 1) then 

col or_adr [11:03 = color_base_adr [11 : 0 J {21 : 5) 
elsif { adr_update == 1) then 

if (requsel == NULL ) then 
//do nothing 

elsif ((fifo_end == 1)AND (last_wd == 1)) then 

color_adr(req_selj = color_base_adr (req_selj (21 : 5J 

else 

color_adr(req_sel] +-^ // normal increment 
// select the address pointer 
llu_diu_radr » color^adr [re<i_sel) 



31.5.6.5 Line pointer 

The line pointer logic counts the number of dot data lines read from DRAM for each color. The counter 
value is used to signal the fifo wrap point to the address generator logic. A separate counter is maintained 
for each color. 

The end of a line can be determined when the address is updated (adrjupdate equal 1) and the word trans- 
ferred is the last word of a line (iast^wd equal I). The line pointer that needs to be updated is selected by 
the req__sel bus from the write pointer block. If the selected pointer is zero the counter is reset to the corre- 
sponding color J\fo_size value, otherwise the counter is decremented. 

If the llu^go^ulse signal is high the counters are reset to its corresponding color Jifo^size value. When 
the counter is zero it sets the fifo_end bit to signal the address generator that the fifo has wrapped (to 
update the address pointer accordingly). 

if (llu_go_pulse == 1) then 

line_pt [11 :0] = color_f if o_si2e( 11 : 0) 
elsif { (adr_update 1> AND (last_wd =» 1)) then { 

if (line_pt (req_sel) == 0) 

line_pt [req_sol J = color^f if o_size [req^sel] 

else 

line_ptlreq_sel] — 

> 

// select the correct line pointer for comparison 
fifo_end = (line_pt [line^pt] == 0) 

31.5.6.6 Write pointer 

The write pointer logic maintains the buffer write address pointers, determines when the DIU buffers need 
a data transfer and signals when the DIU buffers are empty. The write pointer deteimines the address in the 
DIU buffer that the data should be transferred to. 

The write pointer logic compares the read and write pointers of each DIU buffer to determine which buff- 
ers require data to be transferred from DRAM (pendflJ.OJ bus), and which buffers are empty (the 
buf^emp signals). Only enabled buffers are considered as indicated by the coior_enable bus. 
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Buffets are grouped into odd and even buffers, if an odd buffer requires DRAM access the odd^end sig- 
nals will be active, if an even buffer requires DRAM access the evenj>end signals will be active If both 
odd and even buffer require DRAM access, the even buffers will get serviced first. 

If any buffer requires a DRAM transfer, the logic will indicate to the interface controller via the req_active 
signal, with the odd_€ven^el signal determining which group of buffers get serviced The interface con- 
troller will check the color_enable signal and issue DRAM transfers for all enabled colors in a group 
When the transfers are complete it tells the write pointer logic to update the request pending via 
reqjupdate signal. 

The req^selfS. OJ signal teUs the address generator which buffer is being serviced, it is constructed from 
the odd_fivenjsel signal and the color jcnt [2:0] bus from the interface controller. When data is being trans- 
ferred to DRAM the word pointer and write pointer for the corresponding buffer are updated. The req^el 
determines which pointer should be incremented. 

The write pointer logic operates the same way regardless of whether the transfer is null or not. 



// detezmine which buffers need updates 
for( i=0; 1<12; i++) { 

// determine if request is active, filtered by color enable 
if ( wr_adr(il f3:21 «= rd_adr (i] [3 :2) > 

pend[i] » 1 
else 

pend[i] - 0 
// determine if any enabled buffer is empty 

if ((wr^adrtij (3:0) rcSLadr [ i] [3 :0] > AND <color_enable[i / 2) == i) ) then 
buf _enip [ i ) =1 

) 

// Odd half colors (1,3,5,7,9,11), even half colors (0,2,4,6,8,10) 
oddjpend - ( pendtlj | pendt3J | pendCS) | pend[7] | pend[9J \ pendlllj ) 
even_pend = ( pend(O) | pend[2] j pendT*) | pend[6) | pend(8] | pend[10] ) 
// fixed servicinsr order, only update when controller dictates so 
if (req_update ■= 1) then { 

if (even^end == 1) then // even always first 

odd_even_sel = 0 

req_active = 1 
elsif (odd_pend =- 1 ) then // then check odd 

odd_even_sel « 0 

reo-active = 1 

// nothing active 

odd_even_sel = 0 
req^active = 0 

) 

// selected requestor 

reQL-sel(3:0J = {color_cntt2: 0] ,odd_even_3el} // concatentation 



The write address pointer logic consists of 12 2-bit counters and a word select pointer The counters are 
reset when llu_so^ulse is one. The word pointer {word^tr) is common to all buffers and is used to write 
64-bit words into the DIU bufiFer. It is incremented when buf_rd_en is active. If the word _ptr is 3 and the 
buf^rd^en is active the selected write pointer {wr^trfreqjsel]) will be incremented. A concatenation of 
the write pointer and the word pointer are use to construct the buffer write address. The write pointere are 
not reset at the end of each line. 



// determine which pointer to update 
if (buf_wr_en =- 1) then { 

wr_adr f req^sel J 

vnr_enCreq_sel] = 1 

) 
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// determine which pointer to update 
if (llu_go_j>ulse » 1) then 

wr_ptr[ll:0] *5 0 

wordjptr * 0 
elsif (buf^<l^en 1) then < 

word_ptr+* 

if (*«>r4_ptr =» 3 ) then 
vrr_ptr Creq^sel } ♦ ♦ 

) 

// create the address from the write pointer and word pointer 
wr_adr[re<i«seX] = {wr_ptrrreQ_sel) .word_ptr) // concatenation 



31.5.6,7 Word count 



The word count logic maintains 2 counters to track the number of words transferred from DRAM per line, 
one counter for odd data, and one counter for even. On receipt of a Uu^o _pulse, the counters are initial- 
ized to the color Jinejnc value (number of words per line). When a group of words are transferred to 
DRAM as indicated by the word^dec signal from the interface controller, the corresponding counter is 
decremented The counter to decrement is indicated by die odd^evenjsel signal from the write pointer 
block (even =» 0, odd = 1 ). 

When a counter is zero the /as/_W signal for that group (i.e. odd or even) is set. The last_wd signal indi- 
cates to the address generator that the next word transferred from DRAM for the corresponding color is the 
last word in the line. When the last word actually gets transferred the interface controller will pulse the 
word^dec signal causing the corresponding word count to reset to the colorjinejnc value, 

/ / determine which counter to decrement 
if (llu_go_pulse =- 1) then 

word_cnt[0] = color_line_inc // odd count 

word_cnt[l] = coaor_line_inc // even count 
elsif (word_dec 1) then { // need to decrement one word counter 

if (word_cnt[odd_even_sel] == 0) then // line finish 

word_cnttodd_even„sel3 = color_line_inc 

else 

word^cnt [odd_even_seH — 

> 

// select the correct the last_wd 
last_wd = (word_cnt (odd_even_sel I 0) 

The word count logic also determines when a complete line has been read from DRAM, it then signals the 
fifo fill level logic in both the LLU and DWU (via line^rd signal) that a complete line has been read by the 

LLU (llu__dwujine_rd), 
// line finish logic 
if (llu_go_pulse == 1) then 

line_fin = o 

line_rd = 0 

elsif ({last^wd == 1) AND (line.fin == 0) AND (word^dec == i ) ) then 
line_fin =1 // first group last_wd finish pulse 

line_rd = 0 

elsif ((last_wd =« 1> AND (line^fin =» 1) AND (word_dec == l ) ) then 
line_fin =0 // second group last_wd finish pulse 

line_rd = 1 

else 

line_fin = line_fin // stay the same 

line_rd = 0 
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32 PrintHead Interface (PHI) 

32.1 Overview' 

The Printhead interface (PHI) accepts dot data from the LLU and transmits the dot data to the prinihead, 
using the printhead interface mechanism. The PHI generates the control and timing signals necessary to 
load and drive the bi-lithic printhead. The CPU determines the line update rate to the printhead and adjusts 
the line sync frequency to produce the maximum print speed to account for the printhead IC's size ratio 
and inherent latencies in the syncing system across multiple SoPECs. 

The PHI also needs to consider the order in which dot data is loaded in the printhead. This is dependent on 
the construction of the printhead and the relative sizes of printhead ICs used to create the printhead. See 
Bi-lithic Printhead Reference document for a complete description of printhead types [10]. 

The printing process is a real-time process. Once the printing process has started, the next Printline's data 
must be transferred to the printhead before the next line sync pulse is received by the printhead. Othenvise 
the printing process will terminate with a buffer underrun error. 

The PHI can be configured to drive a single printhead IC with or without synchronization to other 
SoPECs. For example the PHI could drive a single IC printhead (i.e. a printhead constucted with one IC 
only), or dual IC printhead with one SoPEC device driving each printhead IC. 

The PHI interface provides a mechanism for the CPU to directly control the PHI interface pins, allowing 
the CPU to access the bi>Iithic printhead to: 

• determine printhead temperature 

• test for and determine dead nozzles for each printhead IC 

• initialize each printhead IC 

• pre-heat each printhead IC 

Figure 233 shows a high level data flow diagram of the PHI in context. 



SoPEC 
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Temp data 



contfot 



CPU 
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test data, tefnoeratufe 




BMithic Printhead 
Figure 233. High level data flow diagram of PHI in context 
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32.2 Printhead mooes of operation 



defined in Table 158. "usi oe in tne same mode of operation. The modes of operation aie 



i3 



Table 158, Printhead modes of operation 





















Normal pnnt mode, dot data is ctocked into the print- 


DOT^LOAD/ 
FIREJNIT 


1 


0 




Dot Load Mode, data stored in the dot shift register is 
transferred into the dot latch on the falling edge of 
phijsynd, and latched In on the rising edge of 
phijsynci 




Rre load mode. Parameter for generating fire pattern 
are loaded into generator, data on phi^h datap mO] 
Is clocked into the generator on each rising edge of 
phUrctk * 


TEST_MODE 


0 


0 




Dot Load Mode, data stored In the dot shift register is 
transferred into the dot register on the rising edge of 
phijsynd, identical to DOT LOAD 


phCsfcfk=0 


The pnnthead is fn test mode, the temperature delta 
Sigma is clodced out of the printhead on the rising of 
frdk through phi__ph_^da(a[1:0J[1 J 
The resuH of the nozzle test is clocked out of the print- 
head through ph/_ph^data[l:Oj[0] 


FfRE.GEN 


0 


1 


N/A 


The nozzle test circuit is reset 

CMOS testing mode, the dot shift register is scanned 

out of the printhead on the falling edge of phi srdk. 

Data is output on phi_j)h_data[1:0}[1:0] 

The initiailsed generator creates the fire pattern and 

shift select pattern, and the pattern is docked Into the 

fjre shift register and select shift register on the rising 

edge of phLfrcik 



32.3 Data rate equalization 



KS" """" '"^ " " ""^ 0- ta»fi=. clock i. a 

requires that the LLU be able to suddIv data at th«, maJi J..™ . r v 1 ■ * ^ 
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the PHI transfers dot data to the printhead at a pre-programmed rate, proportional to the ratio of the shorter 
to loxiger printhead ICs. 



Without Rata equalization f7:3 head) 

I * 

phljsynd 
phi_ph.data(oni:0] 



100 usee 



1 



phi^h.data(l][1:0] 
phLsrclk[0] 

phLsrclkllj' 



1 



With Rate equalization (7:3 head) 
phijsynci |J 

pW-Ph.dala[0][1 :0] — 
phl_ph_data(1JI1:0] 

phLsrclk[OJ 



phLsfdkll] 



m 



Figure 235. Printhead data rate equalization 

The printhead data rate equalization is controlled by PrintHeadRatefl :0] registers (one per printhead IC). 
The register is a 16 bit bitmap of active clock cycles in a 16 clock cycle window. For example if the regis- 
ter is set to OxFFFF then the output rate to the printhead will be full rate, if it*s set to OxFOFO then the out- 
put rate is 50% where there is 4 active cycles followed by 4 inactive cycles and so on. If the register was 
set to 0x0000 the rate would be 0%. The relative data transfer rate of the printhead can be varied from 0- 
100% with a granularity of 1/16 steps. 



Table 159. Example rate equalization values for common printheads 









6:2 


OxFFFF (100%) 


0x1 in (25%) 


7:3 


OxFFFF (100%) 


0x5551 (43,7%) 


6:4 


OxFFFF (100%) 


OxFlF2(68.7%) 


5:5 


OxFFFF (100%) 


OxFFFF (100%) 



If both printhead ICs are the same size (e.g. a 5:5 printhead) it may be desirable to reduce the data rate to 
both printhead ICs, to reduce the read bandwidth from the DRAM. 
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32.4 Dot generate and transmit order 

Several printhead types and arrangements exists (see Section 35 Memjet Printhead) . The PHI is capable of 
driving all possible configurations, but for the purposes of simplicity only one arrangement (airangement 0 
- see Section 35 Menyet Printhead) is described in the foUowing examples. 



Dot Transmit 
Order ~ 



ntfffffffiffifffffrffffrfifff»ffffffif(n 



Or o o Q . 



■ o o o 



va-6 m-4 m- 



^ , — \t t nmmmu 

fftmnTfririrfiffifffiffiffffriffiffrfffif 



o o o o 



o Q o 



m-S n-l m-l 




o o o 



m+2 ro+4 



■o o o o 



ih6 n-4 n-2 



Type 0 printhead IC 



O O Q . 



n^l n^) 



■o o o o 

n-5 n-3 n-l 



Type 1 printhead IC 
Paper 



I 



5 Lines 



Paper 
Direction 



M - Midway point in dots 
N - Number of dots In a Una 



Note: Paper passing under prfntfiead 

Figure 236. Printhead structure and dot generate order 

The structure of the printhead ICs dictate the dot transmit order to each printhead IC. The PHI accepts two 
streams of dot data from the LLU, one even stream the other odd The PHI constructs the dot transmit 
order streams from the dot generate order received from the LLU. Each stream of data has already been 
arranged m increasing or decreasing dot order sense by the DWU. The exact sense choice is dependent on 
the type of pnnthead ICs used to construct the printhead, but regardless of configuration the odd and even 
stream should be of opposing sense. 

The dot transmit order is shown in Figure 236. Dot data is shifted into the printhead in the direction of the 
arrow, so from the diagram (taking the type 0 printhead IC) even dot data is transfeired in increasing order 
to the mid point first (0, 2. 4. .... m-6, m-4, m-2). then odd dot data in decreasing order is transferred (m-l , 

5' 3, I). For the type 1 printhead IC the order is reversed, with odd dots in increasing order 

transmitted first, followed by even dot data in decreasing order. Note for any given color the odd and even 
dot data transferred to the printhead ICs are from different dot lines, in the example in the diagram they are 
separated by 5 dot hnes. Table 160 shows the transmit dot order for some common A4 printheads. Differ- 
ent type pnntheads may have the sense reversed and may have an odd before even transmit order or vice 
versa. 

Table 160. Example printhead ICs, and dot data transmit order for A4 (13824 dots) page 



Type 0 Printhead IC 



8 

7 


11160 
9744 


0,2.4,8 5574,5576,5578 

0;?.4.8 4866.4868,4870 


5579.5577,5575 7.5,3.1 

4871,4869,4867 7.5.3.1 


6 


6328 


0.2,4,8 4158.4160,4162 


4163,4161,4159 7.5.3.1 


5 


6912 


0.2,4,8 .3450,3452.3454 


3455,3453,3451 7,5.3.1 


4 


5496 


0.2,4,8 ^742.2744,2746 


2847.2845,2843...-..7.5.3,1 


3 


4080 


0.2.4.8 .2034,2036.2038 


2039,2037.2035 7.5.3,1 
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Table 1 60. Example printhead ICs, and dot data transmit order for A4 (1 3624 dots) page 





Type 1 Printhead 


C 


8 


11160 


13823,13821.13819 1337.1335,1333 


1332,1334.1336 13818.13820.13822 


7 


9744 


13823,13821.13819 2045.2043.2041 


2040.2042.2044 13818,13820.13822 


6 


8328 


13823.13821.13819 . — ,2853,2851,2849 


2648,2850.2852 13818.13820.13822 


5 


6912 


13823.13821.13819 ,3461.3459.3457 


3456.3458,3460 13818.13820,13822 


4 


5496 


13823.13821,13819 4169.4167.4165 


4164.4166.4168 13818.13820.13822 


3 


4080 


13823.13821.13819 4877.4875.4873 


4872,4874.4876 1 381 8.1 3820.1 3822 


2 


2664 


13823.13821.13819 5585.5583.5581 


5560.5582.5584 13818.13820.13822 



32.4.1 Dual Printhead IC 

Generate dot order (from the LLU) 

Odd Dot stream 
Even Dot stream 



fe5,B«S|j3.820ji3« 



6912 dock cycles 

Mid 
Point 



Transmit dot order(to the printhead) 
Printhead Channel A 
Printhead Channel B 



.:^13818;l3920:i3822 



docK cycles 



■X- 



a040 clock cyctes 



"iia Even dots from Line Y 
Odd dots from Une Y-5 



9744 dock cydes 



Example: Une with 13624 dots, with 7:3 printhead 
Figure 237. Dot data generated and transmitted order 



The LLU contains 2 dot generator units. Each dot generator reads dot data from DRAM and generates a 
stream of dots in increasing or decreasing order. A dot generator can be configured to produce odd or even 
dot data streams, and the dot sense is also configurable. In Figure 237 the odd dot generator is configured 
to produce odd dot data in decreasing order and the even dot generator produces dot data in increasing 
order. 

In order to reconstruct the dot data streams from the generate order to the transmit order, the connection 
between the generators and transmitters needs to be switched at the mid point. At line start the odd dot 
generator feeds the type 1 printhead, and the even dot generator feeds the type 0 printhead. This continues 
until both printheads have received half the number of dots they require (defined as the mid point). The 
mid point is calculated from the configured printhead size registers (PrintHeadSize). Once both printheads 
have reached the mid point, the PHI switches the connections between the dot generators and the print- 
head, so now the odd dot generator feeds the type 0 printhead and the even dot generator feeds the type 1 
printhead. This continues until the end of the line. 
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It is possible that both printheads wiU not be the same size and as a result one dot generator may reach the 
mid point ^efore the other. In such cases the quicker dot generator is stalled until both dot generators r«ach 
the mid point, the connections are switched and both dot generatois are restarted. 
Note that in the example shown in Figure 237 the dot generator could generate an A4 line of data in 6912 
cycles, but because of the mismatch in the printhead IC sizes the transmit time takes 9744 cycles. 

I 32.4.2 Single printhead IC 

In some cases only one printhead IC may be comiected to the PHI. In Figure 238 the dot generate and 
I transmit order is shown for a single IC printhead of 9744 dots width. While the example shows the print- 

h«id IC connected to channel A. either channel could be used. The LLU generates odd and even dot 
I -TT ^."T'^' "° knowledge of the physical printhead configuration. The PHI is configured 

I with the pnnthead size (PrintHeadSize{IJ register) for channel B set to zero and channel A is set to9744. 

Generate dot order (from the LLU) 
Odd Dot stream 

Even Dot stream 




4872 dock cydes 



Transmit dot order(to the pnnthead) 



Mid 
Point 



Printhead Channel A mi^m}S'^m^^i^c,^m^':m 






Printhead Channel B 

^ 4^;i^ dock cycles 






< 


^ ^ 4872 clock cydes 


► 




9744 dock cydes 


► 


Even dots from Line Y 






¥k: Odd dots from Line Y-5 


Example: Una with 9744 dots, with 7:0 printhead 





Figure 238, Dot data generated and transmitted order (single printhead case) 

Note that in example shown in Figure 238 the dot generators could generate an 7 inch line of data in 
4872 cycles, but because the printhead is using one IQ the transmit time takes 9744 cycles, the same speed 
as an A4 line with a 7:3 pnnthead. 

32.4.3 Summary of generate and transmit order requirements 

In order to support all the possible printhead arrangements, the PHI (in conjuction with the LLU/DWU) 
must be capable of re-ordering the bits according to the following criteria: 

• Be able to output the even or odd plane first 

• Be able to output even and odd planes independently. 

• Be able to reverse the sequence in which the color planes of a single dot are output to the printhead. 
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32.5 Print sequence 

The PHI is responsible for accepting dot data streams from the LLU, restructxiring the dot data sequence 
and transferring the dot data to each printhead within a line time (i.e before the next line sync). 

Before a page can be printed the printhead ICs must be initialized. The exact initialization sequence is con- 
figuration dependent, but will involve the fire pattern generation initialization and other optional steps. The 
initialization sequence is implemented in software. 

Once the first line of data has been transferred to the printhead, the PHI will interrupt the CPU by asserting 
the phijcu^rint^nfy signal. The interrupt can be optionally masked in the ICU and the CPU can poll the 
signal via the PCU or the ICU. The CPU must wait for a print ready signal in all printing SoPECs before 
starting printing. 

Once the CPU in the PrintMaster SoPEC is satisfied that printing should start, it triggers the LineSync- 
Master SoPEC by writing to the PrintStart register of all printing SoPECs. The transition of the PrintStart 
register in the LineSyncMaster SoPEC will trigger the start of Isyncl pulse generation. The PrintMaster 
and LineSyncMaster SoPEC are not necessarily the same device, but often are the same. For a more in 
depth definition see section 12,3 Multi-SoPEC systems on page 104. 

Writing to the PrintStart register generates a pulse which is used to generate the line sync in the LineSjm- 
cMaster which is in turn used to align all SoPECs in a multi-SoPEC system. All printhead signaling is 
aligned to the line sync. The PrintStart is only used to align the first line sync in a page. 

When a SoPEC receives a line sync pulse it means that the line previously transferred to the printhead is 
now printing, so the PHI can begin to transfer the next line of data to the printhead. When the transfer is 
complete the PHI will wait for the next line sync pulse before repeating the cycle. If a line sync arrives 
before a complete line is transferred to the printhead (i.e. a buffer error) the PHI generates a buffer imder- 
mn interrupt, and halts the block. 

For each line in a page the PHI must transfer a full line of data to the printhead before the next line sync is 
generated or received. 

32.5.1 Sync pulse control 

If the PHI is configured as the LineSyncMaster SoPEC it will start generating line sync signals LsyncPre 
number of phiclk cycles after PrintStart register rising transition is detected. All other signals in the PHI 
interface are referenced firom the falling edge oTphiJsyncl signal. 

If the SoPEC is in line sync slave mode it will receive a line sync pulse from the LineSyncMaster SoPEC 
through ihephijsyncl pin which will be programmed into input mode. The phijsyncl input pin is treated 
as an asynchronous input and is passed through a de-glitch circuit of progranunable de-glitch duration 

{LsyncDeglitchCnt), 

The phijsyncl will remain low for LsyncLow cycles, and then high for LsyncHigh cycles. The phijsyncl 
profile is repeated until the page is complete. The period of the phijsyncl is given by LsyncLow + Lsyn- 
cHigh cycles. Note that the LsyncPre value is only used to vary the time between the generation of the first 
phi_lsyncl and the PageStart indication from the CPU. See Figure 239 for reference diagram. 

If the SoPEC device is in line sync slave mode, the LsyncMinPeriod register specifies the minimum 
allowed phijsyncl period. Any phijsyncl pulses received before the LsyncMinPeriod has expired will 
trigger a buffer underrun error. 

32.5.2 Shift register signal control 

Once the PHI receives the line sync pulse, the sequence of data transfer to the printhead begins. All PHI 
control signals are specified from the falling edge of the line sync. 
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TTie ^A,_jrc/* (and consequently phi^h_data) is controlled by the StclkPre, Sr^lkPost registers The 
Z A."^ f ^ """^ I °^ '^y*''" *° beginning to transfer data to the printhead 

2l^c nf^r^uf """^^^ ofth^ phLsrclk is controlled by PrintHeadRate register and the 

^a H^ f l^^^: "''""P" P°^^^^ ♦'^^^ '"P-^ FIPO ^ould empty and no data 
would be transferred to the pnnthead while the PHI was waiting. After all the data for a pn^nthead is trans- 
fened to the PHI, it counts &xlkPost number of ^tett cycles. If a new phi_lsyncl falling edge arrives 
before die count is complete the PHI will generate a buffer undenun interrupt (phUcu_undemmy 

32.5.3 Rring sequence signal control 

^".f^^^i "i!/"'^'' ^'^^ ^ determined by 4 registers FrclkPre, FrclkLow. FrclkHigh, 

PrclkNwn. JheFrclkPre re^r specifies the number of cycles between line sync felling edge and Uie 
phijrclk pulse hi^ It remaus high for Fh:lkHigh cycles and then low for FrclkLow cycles. T^e number 
of pulses generated per line is determined by FrclkNum register 

T^MCpnj/Wc pin is specified in a similar manner by the ProfilePn. ProfileLow, ProfileHigh. PrqfileNum 

The/»A/j?-cttpeTiod and the phi^rqfile period should be programmed the same, so FrclkHigh + FrclkLow 
should equal ±c ProfileHigh + ProfileLow, and the number of cycles for each in a line rime should also be 
equal i.e. FrclkNum = ProfileNum. 

^''^^TS^I y«=]f to coniplete a firing sequence should be less than thep/,/ Isyncl period 
i.e. {{ProfileHigh + ProfileLow) * ProfileNum)+ ProfilePre < {UyncLow + LsyncHigh). 





^4 


LsvncPeriod 






LsvncHlgh 


■ ► 

■ ► 



phLlsyncI 



phLsrclH 



phf _ph_data 



phijrdk 



phi_profile_ 



.SfclkPre 



"L 



^ SfdkPpst ^ 



FrdkPrs 



FrclkHigh RdkLow 
> 4 1 



PfofitePre 



ProffleHlgh 

3^ 



ProfileLow 
^ ► 



Figure 239. Printhead interface timing parameters 

Figure 239 details the timing parameters controlling the PHI. All timing parameters arc measured in num- 
ber of phiclk cycles. 



32.5.4 Page complete 



The PHI counts the number of lines processed through the interface. The line count is initialised to the 
PageLenLine and decrements each time a line is processed. When the line count is zero it pulses the 
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phijcu^agejinish signal, A pulse on the phi Jcu^agejinish automatically resets the PHI Go register, 
and can optionally cause an interrupt to the CPU. Should the page tenninate abnonnally, i.e. a bufifer 
underrun, the Go register will be reset and an interrupt generated. 



The PHI will generate an interrupt to the CPU after a predefined number of line syncs have occured. The 
number of line syncs to count is configured by the LineSyncInterrupt register. The interrupt can be dis- 
abled by setting the register to zero. 



The PHI block allows the generation of margins either side of the received page from the LLU block. This 
allows the page width used within PEP blocks to differ from the physical printhead size. 

This allows SoPEC to store data for a page minus the margins, resulting in less storage requirements in the 
shared DRAM and reduced memory bandwidth requirements. The difference between the dot data line 
size and the line length generated by the PHI is the dot line margin length. There are two margins specified 
' for any sheet, a margin per printhead IC side. 

The margin value is set by programming the DotMargin register per printhead IC. It should be noted that 
the DotMargin register represents half the width of the actual margin (either left or right margin depending 
on paper flow direction). For example, if the margin in dots is 1 inch (1600 dots), then DotMargin should 
be set to 800. The reason for this is that the PHI only supports margin creation cases 1 and 3 described 
below. 



32.5.5 Line sync Interrupt 



32.6 



Dot line margin 
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See example in Figure 240. 



Margin 
(200(Jots) 
W 



Print area(4772 dots) 




Type 0 Head IC (9744 dots) 



Type 1 Head IC (4080 dots) 
paper 



Papar 

Direction 



Casal 

LLUdata. 
phij>h_clata _ 
phLsrclk " 

Case2 

aUdata. 
pWjJh^data ^ 
phLsfdk ~ 

Cased 



IsyncI LT 



LLU data- 

phi_ph_data 
ptiUsrclk ' 



1J~ 



9544 dots 



100 do& 



Ffgura 240. Printhead timing with margining 



In the example Ae margin for the type 0 printhead IC is set at 100 dots (DotMargin=='lOO). implying an 
actual margin of 200 dots. 6<"« 

If case one is used the PHI takes a total of 9744 phUrx:lk cycles to load the dot date into the type 0 print- 

fn^ "^T^^ ^""^^ "^^"^ ^ *^ read from the DRAM In this case 

the first 100 and last 100 dots would be zero but arc processed though the SoPEC system consuming mem- 
ory and DRAM bandwidth at each step. 

In case 2 the LLU no longer generates the margin dots, the PHI generates the zeroed out dots for the mar- 
gming The phi_srclk stiU needs to toggle 9744 times per line, although the LLU only needs to generate 
9544 dots givmg 4e reduction in DRAM storage and associated bandwidth. The case 2 senario is not sup- 
ported by the PHI because the same effect can be supported by means of case 1 and case 3. 
If case 3 is used thebenefits of case 2 are achieved, but the phUn:lk no longer needs to toggle the fidi 
Tn^'^i^^^^^'l J^' cy<=l« -^o^nt «n be reduced by the margin amount (in this case 9744- 

100=9644 dots), and due to the reduction mphUrclk cycles thcphijsyncl period could also be reduced, 
mcreasmg the luie processing rate and consequently increasing print speed. Case 3 works by shifting the 
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odd (or even) dots of a margin ftom line Y to become the even (or odd) dots of the margin Y-4, (Y-5 
adjusted due to being printed one line later). This works for all lines with the exception of the fiist line 
where there has been no previous line to generate the zeroed out margin. This situation is handled by add- 
ing the line reset sequence to the printhead initialization procedure, and is repeated between pages of a 
document See section 32.8.3 on page 512. 

32.7 Dot counter 

For each color the PHI keeps a dot usage count for each of the color planes (called AccumDotCount). If a 
dot is used in particular color plane the corresponding counter is incremented. Each counter is 32 bits wide 
and saturates if not reset. A write to the DotCountSnap register causes the AccumDotCotint[N] values to 
be transferred to the DotCount[N] registers (where N is 5 to 0, one per color). The AccumDotCount regis- 
ters are cleared on value transfer. 

The DotCountfN] registers can be written to or read from by the CPU at any time. On reset the counters 
are reset to zero. 

The dot counter only count dots that are passed from the LLU through ther PHI to the printhead. Any dots 
generated by direct CPU control of the PHI pins will not be counted. 

32.8 CPU lO CONTROL 

The PHI interface provides a mechanism for the CPU to directly control the PHI interface pins, allowing 
the CPU to access the bi-lithic printhead: 

• Determine printhead temperature 

• Test for and determine dead nozzles for each printhead IC 

• Printhead IC initialization 

• Printhead pre-hcat function 

The CPU can gain direct control of the printhead interface connections by setting the PrimHeadCpuCtrl 
register to one. Once enabled the printhead bits are driven directly by the PrintHeadCpuOut control regis- 
ter, where the values in the register are reflected directly on the printhead pins and the status of the print- 
head input pins can be read directly from the PrintHeadCpuIn. The direction of pins is controlled by 
programming PrintHectdCpuDir register. The register to pin mapping is as follows: 



Tattle 161. CPU control and status registers mapping to printhead Interface 









PrintHeadCpuOut 


1:0 


P>^i-J>h_data_o[0][l :0) 




3:2 


phij)h_data_o(1][1:01 




4 


phUsynd_o 




5 


phi^readl 




7:6 


phLsfctk[1.-0] 




8 


phLfrdk 




9 


phi_profile 
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i 

p 










1:0 


phij>h_<lata_eI0](1:O] direction control. 
1 - output mode 
0 - Input nxKle 




3:2 


phl_ph_data^e{ini .-01 direction control 
1 - output mode 
0 - input mode 


4 


phljsynd.e direction control 
1 • output mode 
0 • Input mode 


PrfntHeadCpuIn 


1:0 


phi_ph_dala J[0]f 1 :0} 


3:2 


phl_ph.dataj[l 1(1:0] 


4 


phLteyndJ 



«,;„tk«,^ -.^ .. J ''""^z""-"^' "'oue u IS me responsibility of the CPU to drive the 

Sitge™^'' P^*''^ '''^ ^•^'^ «^ ^tivating all 

Note the foUowmg procedures are based on current piinthead capabilities, and are subject to change. 
32.8.1 Dead nozzle information capture 

fe^^rH^^.^!^^?'' Printhead control mechanism) has the capability of testing each of the nozzles in 
?v r rSm ''"^^S "^^'^ ^« the resultant dead nozzle iifonnation is proceieS 

by the CPU to generate the dead nozzle table used by the DNC. P'occssea 

32. 8. 1. 1 Nozzle test procedure 

The no^e test software must first initialize the fire pattern generator for each printhead IC as normal then 
.t must m.t,al«e the fiie pattern register as normal. The fire pattern generator parameters must ^ chos^" 
so as to create a fire pattern where only one nozzle is firing at a time. 

For example if the printhead is constmcted with a 7:3 configuration where the left printhead is 7 inches 
and the "ght 3 mches. The fire pattern length is equal to the number of dots in a half uTc ^LEN=^ 

p'lnem^tfoSiir^^^^^^ 

STuM'^he'^H'' * n"*"' " ^'^'^^ P^"*'"'- Any test pattern could be used it 

H^K '° *° * * Once the printhead shift registers are ini- 

tialized die software can begin the nozzle test sequence. 8ac>«eim 

il^'iv." aT* ^ FIRE_GEN mode which resets the test circuit. ho^phUrclk ^Aphijrclk are held 
macuve. After a pre-determmed tune the printhead is put in TEST_MODE where the novels tested. 

S toggles />A/^ro/f/e output pin and then samples the test result on th^ phi_ph data pin. 

The test software then generates one phijrclk pulse to advance the fire pattern and repe^ the profile 

dot l7nf °^ " """^ P™"**"" ^ "'P'^'*'* • 2 times once for each half 
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The dead nozzle software collates all the nozzles test results and produces the dead nozzle table for use by 
tfacDNC. 



phLreadi 



^ nREIWr ,^ nREGEN ^ NORMAL ^ jHRE GEN ^ pST MOD| f RE_6EN ^ ^ST MO^ 

I L :' 1. 



1 



pWj>h_data(OL 



Fire init data 



1 



pW^srdX 


1^ 




Dhi^frcflc 




n " 


Dhi_DrQfile F] * ■ 



Test pattern Data 



Nozzle test result 



Test Repeated Nozzle times 



Figure 241. Nozzle Test Modes & Setup 



32.8.2 Temperature capture 

Occasionally the CPU will need to sample the printhead temperature and possibly adjust the firing profile 
based on the result. 

To capture the printhead temperature, the printhead must be put into TEST_MODE, and the 
phi^hjdataji pin input mode. The CPU will toggle the phi^clk and then sample the phi_ph_dataji to 
capture the temperature data. The cycle is repeated N times, and the N bits of data are used to generate the 
printhead temperature value. The temperature capture waveform is shown in Figure 242. 

The exact number of bits required (i.e. N) and the temperature value generation mechanism is currently 
undefined. 



^ TEST_MODE 

ph}.lsynd | 



phi_readl 



Clock 0 Clock 1 Clock N 



pW-ph«dataJIl) Invalid X Data 0 X Pata 1 / }[ \ Data N X Irtvatid 

phLsfdk 

W H 

Wprt/cOf Clock 
Cycles 

Figure 242. Temperature Capture Waveform 
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Printhead initialization procedure 

Table 162. Parameters lOr Hre Pattern 



■ '^-^^ ^^j^^ iength. Values defines the leng^of the fire pat- 
tern. NLEN=N-1 where N Is the pattern length. 



COUNT 



14 



(viii* ■■— • ' ■ 

Defines the remaining number of <^J^^"^^^ 
oenerate ttie Fire Pattern, is given by COUNT- (1^/2) Mod 
N -1 where is the 'e"0*» «^ prlnftei^ or 
COUKT= - U -((Lb /2) N)) Mod N -1 for the shorter 
printhead 



Select shift register inversion bit. 



„sp.cdv= shift .epstm. Tte p™«l.e»ds m p» , ^Jo. sig„d for boa 

wh.,. L. i, 1.» length of fte ""f '^t^^i'J^^u^^i IC will gc clooM »o 

Section 32 8 4 Fire pattern generator. 

If dot Une reining is to be used .he dot ^^^;t'''''2T^^rr^^ a^^C^^on of dot 
be initialized to zeio before any Ur>e .s pnnted S^-^^^^" dot dSa shift register 

line margin setup. The CPU does this by ^^/^^^^^^ Jot margin rimes for the each 

rrCt^i'i.^^^^^^ 

printing to begin. 

sitt^ ^EfuSr srr\o?.rrof ^'jSit^ pi- » 

parameters and generate the initialization sequence. 
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32.9 Implementation 

32.9.1 Definitions of I/O 



Table 163. Prlnthead interface VO dennltlon 




Clocks and Resets 



pdk 




In 


System Clock 


phicJk 




In 


Printhead interface dock (dbctt/3) used to transfer data from pdk to 
doc/^ domains 


dodk 




In 


Data out dock (2x pc/A) used to transfer data to printhead 


pr8t_n 




In 


System reset, synchronous active low. Synchronous to pctk 


phirst^n 




In 


System reset, synchronous active low. Synchronous to phidk 


dorst_n 




In 


System reset, synchronous acth/e kiw. Synchronous to dodk 


General 


phijcu_prlnt_rdy 




Out 


Indk^ates that the first line of data is transferred to the printhead 
Active high. 


phLicu_page_finish 




Out 


Indicates that data for a complete page has transferred. Active high 


phijcu.underrun 




Out 


indicates the PHI has detected a buffer undemjn. Active high 


phLicuJinesyncJnt 




Out 


Indicates the PHI has detected UneSynclntemipt number of line 
syncs. 


Debug 


debug_data_out(2:0J 


3 


In 


Output debug data to be muxed on to the PHI pins 


debug_cntrl[2:0) 


3 


In 


Control signal for each PHI bound detnig data line Indicating 
whether or not the debug data should be selected by the pin mux 


LLU Interface 


IIU-PhLdatat1:0][5:01 


2x6 


Out 


Dot Data from LLU to the PHI. each bit is a color plane 5 downto 0. 
Bus 0 - Even dot data stream 
Bus 1 ' Odd dot data stream 

Data is active when corresponding bit is active in //u_p/iLavai/ bus 


phijlu_ready[l:0] 


2 


(n 


Indicates that PHI ts ready to accept data from the LLU 

0 - Even dot data stream 

1 - Odd dot data stream 


IUj_phi_availil :0] 


2 


Out 


Indicates valid data present on corresponding tiu^jihijdata. 
0 ' Even dot data stream 
1 • Odd dot data stream 


Printhead Interface 


phi_ph_dataj[1 :01[1 lOJ 


2x2 


In 


Dot data input from printhead. 
Bus 0 - Printhead channel A 
Bus 1 - Printhead channel B 


phi j5h_data„o(1 :0][1 :0] 


2x2 


Out 


Dot data output to printhead. Each k>us to each printhead contains 2 
bits of data 

Bus 0 - Printhead channel A 
Bus 1 - Printhead channel B 


phi j)h_data_e(1 :0](1 :01 


2x2 


Out 


Dot data direction control. Pin is driving when high 
Bus 0 • Printhead channel A 
Bus 1 - Printhead channel B 


phi_srdk(1:01 


2 


Out 


Dot data shift dock used to dock in printhead data 
Bus 0 - Printhead channel A 
Bus 1 - Printhead channel B 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 

Page 513 



SoPEC : Hardware Design 



Table 163. Prlnthead Interface I/O definition 











phLreadt 


1 


Out 


Comnwn printhaad mode control. Used in con]unctioii with 
phijsyndxo determine the prlnthead mode 

0 - SoPEC receiving, prlnthead driving 

1 • SoPEC driving, prlnthead receiving 


phLfrdk 


1 


Out 


Common Rre pattern dock needs to toggle once per fire cyde 


pW __pnofile 


1 


Out 


Common pulse profiJe tor all colors 


phiJsyncl_o 


1 


Out 


Capture dot data for next print line, output mode 


phijsynd^e 


1 


In 


p/jL/synd output enatsle. when high phijsynd pin is driving 


phijsyndj 


1 


In 


Une Sync Pulse from Master SoPEC 


PCU Interface 


pcu^phLsel 


1 


In 


Block select from the PCU. When pcu _phi_sBl \s high both pcu^adr 
and pcujdataouthx^ valid. 


pcu_rwn 


1 


In 


Common read/not-write signal from the PCU. 


pcu_adr[7:2) 


6 


In 


PCU address t3us. Only 6 bits are required to decode the address 
space for this bloclc. 


pcu_dataout(31 :0] 


32 


In 


Shared write data bus from the PCU. 


phi_pcu_rdy 


1 


Out 


Ready signal to the PCU. When phi_pcu_rdy\B high it indicates the 
last cyde of the access. For a write cyde this means pcu^dsttaout 
has been registered by the block and for a read cycle this means 
the data on phi,j>cu_^<fata is valid. 


phi_pcu_data{31 :0J 


32 


Out 


Read data bus to the PCU. 
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32.9.2 PHI sub-block partition 



ICU 



t 



debUQLCntrt- 
debug^data_ou^ 



.1 



PEP Controller Unit 
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r2 .^2 >r2x6 

^1 5 
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svnc err 





data Jin 



Datapath 
Unit 



Fire 

Generator 



Sync 
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x_z. 
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f ^. 



T t 



CPU lO Control 



Master 
SoPEC 



1 1 



1 

i 



i 



Bi-lithic Printhead 



* pdk domain (160 Mhz) 



I I doclk domain (320 Mh2) i i phicOc domain (106 Mhz) 



Figure 243. PHI block partition 



32.9.3 Configuration registers 
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S5 



The configuration registers m the PHI are programmed via the PCU interface. Refer to section 21 8 2 on 
page 257 for a descnption of the protocol and timing diagrams for reading and writing reeisteis in the PHi 
Note that smce addresses in SoPEC are byte aHgned andSTpcU only Lppor32l^S r^^d 
w^tes. the lower 2 bits of the PCU address bus are not required to dec'ode *e address ^SrT^HI 
When reading a agister is less than 32 bits wide zeros should be returned on the upper unused bitfs) 
of pfu_pcu_data. Table 164 lists the configuration registers in the PHI 



Table 164°. PHI registers description 




Control Registers 




0x00 


Reset 


1 


0x1 


Active low synchronous reset, self de-activating. A 
write to this register wiD cause a PHI Wock reset 


0x04 


""go ^ ^ 


1 


0x0 


Active high bit Indicating the PHI is programmed 
and ready to use. A low to high transition wUI cause 
PHI t>!ock internal state to reset. Will be automati* 
caliy reset if a page finish or a buffer undemin is 
detected. 


General Con 


trol 


0x08 


PageLenLine 


32 


0x0000 
_0000 


Specifies the number of dot lines in a page. 


OxOc 


PrintStan 


1 


0x0 


A low to high transition triggers printing to start 
Only active In Master Mode 


0x10-0x14 


DotMargin 


2x16 


0x0000 


Spodfies for each printhead IC, the width ot the 
margin in dots divided by 2. 

0 - Printhead IC Channel A 

1 - Printhead IC Channel B 


0x18-0x20 


DotCount[5:0] 


6x32 


0x0000 
,0000 


Indicates the number of Dots used for a particular 
color, where N specifies a color from 0 to 5. Value 
valid after a write access to DotCountSnap 


0x30 


DotCountSnap 


1 


0x0 


Write access causes the /4cct;n?OofC0£inf values to 
be transferred to the DorCounf registers. The 
AccumDotCount are reset afterwards. 


0x34 


PhiHeadSwap 


1 


0x0 


Controls which signals are connected to printhead 
channels A and B 

0 - NormaL specifies bit 0 is channel A, bit 1 is 
channel B 

1 - Swapped, specifies bit 0 is channel B. bit 1 is 
channel A. 


0x38 


PhtMode 


1 


0x0 


Indicates whether the PHI is operating in master or"" 
slave mode 

0 - Slave Mode 

1 - Master Mode 


0x3C-0x40 


PhlSerialOrder 


2x1 


0x0 


Specifies the serialization order of dots before 

transfer to the printhead. 

Bus 0 - Printhead Channel A 

Bus 1 - Printhead Channel B 

A 0 indicates order ABC, while 1 indicates CBA 
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mmm 








0x44-0x48 


PrintHeadSize 


2x16 


0x0000 


Specifies the number of non-nnargin dots in the 

printhead ICs. If margining is to be used then the 

configured PrintHeadSize should be adjusted by the 

dot margin value i.e. PfUttHeadSixe & {Pnysicai- 

Pn'ntHeadSae - [DotMartfn * 2)). 

Bus 0 - Specifies prfnihead on Channel A 

Bus 1 - Specifies printhead on Channel B 


CPU Direct PHI Control (See Table 161.) 


0x4C 


PrintHeadCpuIn 


5 


0x00 


PHI internee pins Input status. Only active in direct 
CPU mode 


0x50 


PnntHeadCpuDir 


5 


0x00 


PHI Interface pins direction control. Only active in 
direct CPU mode 


0x54 


PiintHeadCpuOut 


10 


0x000 


PHI internee pins output control. Onty active in 
(firect CPU mode 


0x58 


PrintHeadCpuCtrt 


1 


0x0 


Control direct access CPU access to the PHI pins 

0 - Normal Mode 

1 - Direct CPU Control mode 


Line Sync Control 


0x5C 


LsyncLow 


16 


0x0000 


Number of p/i/c//r cycles p/)/L/sync/ should remain 

low. 


0x60 


LsyncHigh 


16 


0x0000 


Number of phicfkcydles phLlsynd should remain 
high. 


0x64 


LsyncPre 


16 


0x0000 


Number of phidk cycles between PrintStart rising 
transition and the generated phL/syncf falling edge 


0x68 


LsyncMinPeriod 


24 


0x00.0 
000 


Minimum number of p/i/c//r cycles between Lsync 
pulses. Lsync pulses of a shorter period will be 
reiecled. Only used in slave mode. 


0x6C 


LsyncDeglitchCnt 


4 


0x3 


Number of phictk cycles to filter the incoming Lsync 
pulse from the nnaster. Only used In slave mode. 


0x70 


LineSyncfnterrupt 


16 


0x0000 


Number of line syncs to occur before generating an 
interrupt. When set to zero interrupt ts disabled. 


Shift Register Control 


0x74 


SrdkPre 


14 


0x0000 


Number of phidk cycles between p/vLteync/ falling 
edge and ph/lsrcsMr pulse generation, or printhead 
data transifer 


0x78 


SrdkPost 


14 


0x0000 


Number of phidk cycles allowed margin from last 
srdk pulse in a line to before next line sync 


0x70-0x80 


PrintHeadRate(1:0J 


2x16 


OxFFFF 


Specifies the active to inactive ratio of phi^sfdk for 
the printhead ICs. A 1 indicates Active. 
Bus 0 ' Printhead IC channel A 
Bus 1 ' Printhead IC channel B 


0x84 


DotOrderMode 


1 


0x0 


Specifies the dot transmit order to the printhead 
Channel A. Printhead Channel B is always the 
opposing order. 

0 - Even before Odd dots 

1 • Odd before Even dots 


Fire Control 


0x88 


Profile Pre 


14 


0x0000 


Number of phidk cycAes phi_lsynd falling edge and 
p/iLpro/7/e pulse generation 
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Table 164. PHI registers description 







m 








rroiueuow 


14 


0x0000 


Number of pNdk cycles phLprofile should remain 
k>w. 


0x90 


ProfileHIgh 


14 


0x0000 


Number of phidkcycAes phi__profif9 should remain 
high. 




ProfileNum 


16 


0x0000 


Number of profile pulses per line time. 


0x98 


FfdkPre 


14 


0x0000 


Number of phidkcydes p/aC/sync/ falling edge and 
phCfrdkpu^e generation 


0x9C 


FrclkLow 


14 


0x0000 


Number of p/i/cflf cycles pNJMk should remain 

low. 


OxAO 


FfclkHIgh 


14 


0x0000 


Number of p/i/C5f* cycles p/iLM^shoukl remain 
high. 


0xA4 


FrclkNum 


16 


0x0000 


Number of p/JLfrc//c pulses per line time. 


Working Reg 


sters 


0xA8-0xAC 


LtneDotCnt 


2x16 


0x0000 


Indicates the number of dot processed in the cur- 
rent line 

Bus 0 - Printhead Channel A 
Bus 1 - Printhead Channel B 
(Read Only Registers) 


OxBO 


UneCnt 


32 


0x0000 
_0000 


Indicates the number of lines processed In this page 
(Read Only Register) 



t 1 j u 7 rr "-^ * ^^^^^ i.iot;2tcu ^\ pciK raics Dui scvcral blocks m the PHI are 

clocked by different and asynchronous clocks. Configuration values are not re-synchionized, it is therefore 
important that the Go register be set to zero while updating configuration values. This prevents logic from 
entering unknown states due to metastable clock domain transfers. 

Some registers can be written to at any time such as the direct CPU control registers {PrintHeadCpuIn 
PnntHeadCpuDir PrintHeadCpuOut and PrintHeadCpuCtrl), the Go register and the PrintStan register! 
All registers can be read from at any time. 

When one of the direct CPU control registers are written to the configuration registers block generates a 2 
cycle pulse {cpujo^wr) which is used to transfer the pin control signals from the pclk domain to i:titphiclk 
domain. The cpujo^wr signal is a delayed version of the write enable from the CPU. 

32.9.4 Dot counter 

The dot counter keeps a running count of the number of dots fired for each color plane. The counters arc 
32 bits wide and will saturate. When the CPU wants to read the dot count for a particular color plane it 
must write to the DotCountSnap register. This causes all 6 nmning counter values to be transferred to the 
DotCount registers in the configuration registers block. The running counter values are reset * 

// reset if being snapped 
if <dot_cnt_snap == 1) then( 

dot_count(5:0I « accuin_dot_count [5 :0) 

«ccuiiudot_coiint (5:0) ss Q 

> 

// update the counts 

for {color=0; color < 6 ; colors*-) { 

if (accuin_dot_count [color) != Oxffff_feff) { 
// data valid, first dot stream 

data^valid = < {phi_llu_ready [0 J == 1) AND (Ilu_phi avail(O) == 1)) 
• if ((data_valid == 1) AND (llu_phi_data[0) [color) == 1)) then 

accuii\_dot_count (color) 
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// data valid* second dot stream 

data^valid = ( {phi_llu_ready fl) •» 1) AND <lluj)hi_avail [1) == 1)) 
i£ ((data.valid ^= X) AND (llu^hi.data(l] (color] » 1)) then 
accuiiL.dot.count (color] •••'«- 

) 

) 

32.9,5 Sync generator 

The sync generator logic has two modes of operation, master and slave mode. In master mode (configured 
by the PhiMode register) it generates the Isyncl^o output based on configured values and control triggers 
firom the PHI controller. In slave mode it de-glitches the inconung IsyncU signal, and filters the Isyncl sig- 
nal with the minimum configured period. 




y tsynd^o • 1 



\t 

^Reset_J 



Machine remains in same stats by default 
All outputs are zero unless otherwise stated 

State Description: 



aync tm^^l ANP 

count visync _pra 



avnc en— Q 



svne ftrt^l AND 
Phi medftg-alawa 



qpuntt«Q 
count - 



CQtinta»Q AND last Hna -a«Q 
count • isyncjow 
lino_5t • < 



count - I 1 



count - laync_low 
line - < 



SyncLOW J laynd.o - 0 



count ■ lsync_hioh 



isyrxJ.o « 1 



Muni«H;>ANDla8t flne"1 



ctxjnt f»Q 
count 



rsvnc Qutae ~» 1 

count • lsync_m*n_period 



77^ — N 

f SyncPeriod \ 



Reset 
. SyncPre: 
SyncLow: 

SyncHigh: 

SyrcWait: 
SyncPeriod: 



Normal reset state 

Count the LsyiKPre number of dock cycles 

Count the LsyncLow number of dock 
cydes 

Count the LsyncHigh number of clock 
cycles 

Wart for an input I sync pulse 

Count the LsyncMinperiod number of clock 
cydes 



count =0 

count a l9ync_mtn,j}eriod 



Isvne ntilsft — 1 AND eouni!»Q 
sync_en 



To Reset Scats 



Figure 244. Sync generator state diagram 



After reset or a pulse on phi_go_pulse the machine returns to the Reset state, regardless of what state it's 
currently in. 

The state machine waits until it's enabled {sync_en—\) by the PHI controller state machine. When 
enabled it can proceed to the SyncPre or SyncWait depending on whether the state machine is configured 
in master or slave mode. In master mode it generates the Isyncl pulses, in slave mode it receives and filters 
the Isyncl pulses from the master sync generator. 

On transition to the SyncPre state a counter is loaded with the LsyncPre value, and while in the SyncPre 
the counter is decremented. When the count is zero the machine proceeds to the SyncLow state pulsing the 
line^st signal on transition and loading the counter with LsyncLow value. This indicates to the PHI con- 
troller the line start aligned to the Isyncl negative edge. 
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The machine waits in the SyncLow state until the counter has decremented to zero. It proceeds to the Syn- 
cHigh state and counts LsyncHigh number of cycles. While in LsyncLow state the IsyncLo output is set to 
0 and in SyncHigh the Uyncl^o output is set to 1. 

When the count is zero and the current line is not the last {lastjine == 0), the machine returns to the Syn- 
cLow state to begin generating a new Ime sync pulse. The transition pulses the line^t signal to the PHI 
controller. 

The loop is repeated until the current line is the last {lastjine «=1), and the machine returns to the Reset 
state to wait for the next page start 

In slave mode die state machine proceeds to the SyncWait state when enabled It waits in this state until a 
lsync_pulse is received from the input de-glitch circuit. When a pulse is detected the machine jumps to the 
SyncPenod state and begins counting down the LsyncMinPeriod number of clock cycles before returning 
to the SyncWait state. On transition from the SyncWait to the SyncPeriod state the line jst signal to the PHI 
controller is pulsed to indicate the line start. While in the SyncPeriod state if a Isync^ulse is detected the 
state machine will signal a sync error (via sync^err) to the PHI controller and cause a buffer underrun 
intermpt. 



32, 9,5. 1 LsyncI input de-glitch 

The IsyncJ input is considered an asynchronous input to the PHI, and is passed through a synchronizer to 
reduce the possibility of metastable states occurring before being passed to the de-glitch logic. 

The input de-glitch logic rejects input states of duration less than the configured number of clock cycles 
(isync_deglitch_cnt), input states of greater duration are reflected on the output, and are negative edge 
detected to produce the Isync^ulse signal to the main generator state machine. The counter logic is given 
by 

if C lsync_i != lsync_i_delayj then 

cnt = l3ync_deglit:ch_cnt 

output^en = 0 
elsif (cnt «» 0 ) then 

cnt s cnt 

output_en = 1 
else 

cnt 

output_en = 0 



IsyncJ ' 



synchontzer 



1 



1 



isync_Ldeiay 



Counter 
Logic 



lsync_degatch_cnt . 



Compare 



Pulse 
Generator 



_Qutput en 



l5ync_4>ulse 



Figure 245. Line sync de-glitch RTL diagram 



32.9.5,2 Line Sync interrupt logic 

The line sync interrupt logic counts the number of line syncs that occur (either internally or externally gen- 
erated line syncs) and determines whether to generate an interrupt or not. The number of line syncs it 
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counts before an inteimpt is generated is configured by the LineSyncInterrupt register. The inteirupt is dis- 
abled if LineSyncInterrupt is set to zero. 
// Inipleiiient the interrupt counter 
if (phl_0o_pulBe then 

line.count » 0 
el8i£ (line.st == 1) AND (line.count == 0)) then 

line_count = linecount_int 
el3i£ ((line_st == 1} AND (line.count \tz O) ) then 

line.count — 
// determine when to pulse the interrupt 
if (linesync_int ac 0 ) then // interrupt disabled 

phi_icu_line3ync_int = 0; 
elsif ((line_st == 1) AND (line.count == then 

phi^icu_linesync_int « 1 



32.9.6 Fire generator 



The fire generator block creates the signal profile for the phijrclk and phi jyrqfile signals to the printhead. 
The profile is based on configured values and is timed in relation to the fire _fiync pulse fcom the PHI con- 
troller block. 



Reset pnpfil ro pwtae"l 



et^ 



* Q Reset J 



fire_rcJy- i 



count a frcOt_pra 



count«=Q 
-| count - (refk^hio^ 
^ repeat;_couin » lictk^num 



count'wO 
count 



ptll_fTClk-1 



repeat_count « 



etxjnt»»Q 
count- 



^^ ^FireLow ^ 



QHJtdk - 0 



counts^ AND 

fttoeal Munl" Q 



Machine remains in same state by defiautt 
All outputs are zero unless otherwise stated 

State Description: 

Reset: Normal reset state 

RrePre: Count the FrdkPre nunnber of dock cycles, 
repeat count set to FrdkNum 

RreHigh: Count the FrclkHIgh number of clock cycles 

RreLow: Count the FrclkLow number of clock cycles 



Figure 246. Fire generator state diagram 

The fire generator consists of 2 identical state machines for creating the phi Jrclk and phi^prqfile signals 
respectively. 

The machine is reset to the Reset state when phi_go _pulse or the reset is active, regardless of the cur- 
rent state. 

The machine waits in the reset state until it receives a fire^st pulse from the PHI controller. The controller 
will generate a fire^t pulse at the beginning of each dot line. On the state transition the cycle counter is 
loaded with the FrclkPre value and the repeat counter is loaded ^vith the FrclkNum value. 
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The state machine waits in the FirePre state until the cycle counter is zero, after which it jumps to the Fire- 
High state and loads the cycle counter with FrtlkHigk value. Again the state machine waits until the count 
is zero and then proceeds to the FvreLaw state. On transition the cycle counter is loaded with the FireLow 
value. The state machine waits in the FireLow state while the cycle counter is decremented. 

When the cycle counter reaches zero and the repeat_count is non-zero, the repeat_count is decremented, 
the cycle counter is loaded with the FrclkHigh value and the state machine jumps to the FireHigh state to 
repeat the generation cycle. The loop is repeated until the repeat^count is zero. In such cases the 
state machine goes to the reset state and waits for the next fire_st pulse. 

When in the Reset state th.tjire_rdy signal is active to indicate to the controller that the fire generator is 
ready. 



The PHI controller is responsible for controlling all functions of the PHI block on a line by line basis. It 
controls and synchronizes the sync generator, the fire generator, and datapath unit, as well as signalling 



32.9.7 



PHI controller 
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back to the CPU the PHI status. It also contains a line counter to determine when a fiiD page has completed 
printing. 



Reset OR Dhl po pufs^al 

\ 

^ Reset 



c 



^1 qo=«1 

HnSlboimt «pagtt JerUine 



FlrstLine 



ine ^ 



Kne st«o1 AND 

data ffnifi 



data fin=al 
line.oount- 



^PrintStart^ print_rdy = 



prim san ==a 



data fin ^= 1 AND 

rine count < pa9e Ibh llnfl 

line_count - 



SyncWait ^ sync_en«l 



Una St 1 
data_st = \ 
ftro^st w 1 
8ync_st a 1 



LineTrans 



data fin ^ 1 AND 
Una count 1 
Hno_count~ 



fiffl rdY=^1 
P&go-iintsh si 



< 



data fin 1^] 



sync_en 



Phi gg pul5g=1 



0 



W Underrun ) und©rnm_error =1 



Jne ^ 



LastLine ) iastjine«i 

sync_en «i 



Figure 247. PHI controller state machine 

The PHI controller state machine is reset to Reset state by a reset or phi_go ^yulse — 1 . 

It will remain in reset until the block is enabled by phi^go = 1 . Once enabled the state machine will jump 
to the FirstLine state, trigger the transfer of one line of data to the printhead (data^st = 1) and the line 
counter will be initialized to the page length [PageLenLine), Once the Hne is transferred {data Jin from the 
datapath unit) the machine will go to Printstart state and signal the CPU using an interrupt that the PHI is 
ready to begin printing (phijcu^rint^rdy). The line counter will also be decremented.; It will then wait in 
the Printstart state until the CPU acknowledges the print ready signal and enables printing by writing to 
the PrintStart register. 

The state machine proceeds to the SyncWdit state and waits for a line start condition Qine_st ==1). The Hne 
start condition is different depending on whether the PHI is config\ired as being in a master or slave 
SoPEC (the PhiMode register). In either case the sync generator determines the correct line start source 
and signals the PHI controller via the line_st signal. Once received the machine proceeds to the LineTrans 
state» with the transition triggering the fire generator to start (fire^t\ih^ datapath unit to start (data_st) 
and the sync generator to surt (sync^st). 
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While in the LineTrans state the fire, sync and datapath unit will be producing line data. When finished 
processing a Ime the datapath unit will assert the line finished {line Jin) signal. If the line counter is not 
equal to 1 (i.e. not the last line) the state machine will jump back to the SyncWait state and wait for the start 
condition for the next line. The line counter will be decremented. If the line counter is one then the 
machine will proceed to the LastLine state. 

The LastLine state generates one more line of fire pulses to print the last line held in the shift registers of 
the printhead. Once complete (fire Jin =1) the state machine returns to the reset state and waits for the 
next page of data. On page completion the state machine generates a phijtcu ^age Jinish interrupt to sig- 
nal to the CPU &at the page has completed, the phijcu^age Jinish will also cause the Go register to reset 
automatically. 

While the state machine is in the LineTrans state (or in FirstLine state and the PHI is in slave mode) and 
waiting for the datapath unit to complete line processing, it is possible (e.g. an excessive PEP stall) tiiat a 
new line start condition occurs but the datapath unit is not ready. In this case an underrun error is gener* 
ated. The state machine goes to the Underrun state and generates a phi^icujunderrun interrupt to the 
CPU. The PHI cannot recover from a buffer underrun error, the CPU must reset the PEP blocks and re- 
start printing. The phi_icu_underrun will also cause the Go register to reset automatically. 



32.9.8 CPU lO control 



The CPU 10 control block is responsible for accepting CPU direct lO control signals from the configura- 
tion registers (at pclk frequency) and transferring them to phiclk frequency. It also accepts the input signals 
from the printhead and re- synchronizes them to the pclk domain, and debug signals from the RDU and 
muxes them to output pins. 

Table 1 61 contains the direct mapping of configuration registers to printhead lO pins. Direct CPU control 
is enabled only when PrintHeadCpuCtrl is set to one. In normal operation (i.e. PrintHeadCpuCtrl — 0) 
the printhead data pins are always in output mode {phi ^h^data^e = 1), the phijsyncl will be in output if 
the SoPEC is the master, i.e. phijsyncl_e = phi_mode, and readl will be set high. 

The pseudocode for the CPU lO control is: 

if (printheacl_cpu_ccrl -= 1) then // CPU access enabled 
// outputs 

phi_ph_data_o[01 (1:0) = printheacl_cpu_out 1 1 : 03 

phi_ph_data_o [ 1 ) ( 1 : 0 ] = pr inthead_cpu_out 1 3 : 2 J 

phi_lsyncl_o = pi'inthead_cpu_out (4 ) 

phi_readl = printhead_cpu_out ( 5 J 

phi_srclk [1 : 0] = printhead_cpu_out {7 : 6] 

phi^frclk = printhead_cpu_out(83 

phi_profile = prinChead_cpu_out [9] 
// direction control 

phi_ph_dat«_e(OJ [ 1 : 0] = printh€ad_cpu_dir [ 1 : 0] 

phi_ph_daca_e[l) tl:0) = printhead^cpu_dir (3 : 2 1 

phi__lsyncl_e » printh,ead_cpu_dir [4 ) 
// input assignments 

printhead_cpu_inri : 0] = synchronize <phi_ph_data_i [OH 1 : 0 3 ) 

printhead_cpu_in( 3 : 2 ) = synchronize (phi_ph_data_i ( 1 I [ 1 : 0] ) 

printhead_cpu_in(5J = synchronize (phi_lsyncl_i ( 01(1 : 0] ) 
else // normal connections 
// outputs 

philjph_data_o ( 0 ] [ 1 : 0 ) = ph_data 1 0 ) ( 1 : 0 1 

phi_ph_data_o [ 1 J { 1 : 0) » ph_da ta 1 1 ] C 1 : 0 1 

phi_lsyncl_o = lsync_o 

phi_readl = 1 

phi_srclk(l:01 = srclk(l:0] 

phi_frclk » frclk 

phi_profile ■ profile 
// direction control 
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phi_ph.data^e[0] [1:0] = 0x3 
phl^h.data.e[l] [1:0] = 0x3 
phi_lsyncl_e = phi_inode 

// inputs 

Isyncl^i e phi.lsync.i // connected regardless 

// debug overrides any other connections 



// depends on Master or Slave mode 



i£ (debug.cntrl[0] 

phi_frclk 

phi^readl 
if (debug_cntrl(l] 

phi^rofile 
if (debug_cntrlC21 

phi_lsyncl_o 

phi_leyncl_e 



1} then 

B debug_data_out[0] 
o pclk 
1) then 

= debug_data.out[l] 
1} Chen 

= debug.data_out [2] 
= 1 



The debug signalling is controlled by the RDU block (see Section 1 L8 Realtime Debug Unit (RDU)), the 
lO control in the PHI muxes debug data onto the PHI pins based on the control signals from the RDU. 
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32.9.9 Datapath Unit 



Pfint.h63(l^«[1>- 



i 



Line Loader Unit (LLU) 
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Figure 248. Datapath Unit partition 
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32.9.10 Dot order controller 

RoMtQflpht go pulsfr«1 



< 



Reset 



3 



dot.ofdef_rdy b1 



date st«=l 
doccm_r8t ■ 1 



moda.sal » dot.order.niode 



mid ptliiQl'^n 



(N mode.sel ° 
UneMId ) gen.enlOlei 



dot_order.mode 

0 

0 



\ mode.sel » -(dot_orddr_mod8) 
UneEnd ) gen.enro] « nvd jrtfOI 
J oen.enhl-mld^l 



Machine remains in same state tiy default 
All outputs are zero unless otherwise stated 

State Description: 

Reset Normai reset state 

Unestart: Start processing first part of the line, wait for 
both mid_pt to be active 

UneMId: Switch over wait state attow pipeline to dear 

UneEnd: Une end processing waK for both Rne_fin to be 
active 



Figure 249. Dot Order controller state diagram 



The dot order controller is responsible for controlling the dot order blocks. It monitors the status of each 
block and determines the switch over point, at which the connections from odd and even dot streams to 
printhead channels are swapped. 

I The machine is reset to the Reset state when phi_go _pulse = 1 or the reset is active. The machine will 

wait until it receives a data^st pulse from the PHI controller before proceeding to the LineStart state. On 
the transition to the LineStart state it will reset the dot counter in each dot order block via the dot^cntjrst 
signal. 

While in the LineStart state both dot order blocks are enabled (jsen_en=\). The dot order blocks process 
data until each of them reach their mid point. The mid point of a line is defined by the configured printhead 
I size (i.e. print Jiead_size), When a dot order block reaches the mid point it immediately stops processing 

and waits for the remaining dot order block. When both dot order blocks are at the mid point {mid _pt — 
1 1) the controller clocks through the LineMid state to allow the pipeline to empty and immediately goes to 
LineEnd state. 

In the LineEnd state the mode_sel is switched and the dot order blocks re-enabled, in this state the dot 
order blocks are reading data from the opposite LLU dot data stream as in LineStart state. The controller 
remains in the LineEnd state until both dot order blocks have processed a line i.e. line Jin =11. 

On completion of both blocks the controller returns to the Reset state and again awaits the next datajst 
pulse from the PHI controller. When in Reset state the machine signals the PHI controller that it's ready to 
begin processing dot data via the dotjjrder_rdy signal. 
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The dot order controller selects which dot streams should feed which piinthead diannels. The order can be 
changed by configuring the DotOrderMode register. In all cases Channel A and Channel B must be in 
opposmg dot order modes. Table 158 shows die possible modes of operation. 

Table 165. Mode selection In Dot order controller. 











A 


0 


0 


Even before Odd (EBO mode), even dot stream feeds 
Channel A printhead, first half line. 




0 


1 


Odd before Even (OBE nrxxie). odd dot stream feeds 
Channel A prlnthead. first half line. 




1 


0 


Even before Odd (EBO mode), even dot stream feeds 
Channel A printhead. second half line. 




1 


1 


Odd before Even (OBE mode), odd dot stream feeds 
Channel A prlnthead. second half line. 


B 


0 


0 


Odd before Even (OBE mode), odd dot stream feeds 
Channel B printhead, second half line 




0 


1 


Even before Odd (EBO mode), even dot stream feeds 
Channel B prlnthead. second half line. 




1 


0 


Odd before Even (OBE mode), odd dot stream feeds 
Channel B printhead. first half line. 




1 


1 


Even before Odd (EBO mode), even dot stream feeds 
Channel 6 prlnthead. first half line. 



32.9. 10, 1 Dot order unit 



The dot order control accepts dot data from either dot stream from the LLU and writes the dot data into the 
dot bufifer. It has two modes of operation, odd before even (OBE) and even before odd (EBO). In the OBE 
mode data from the odd stream dot data is accepted first then even, in EBO mode it's vice versa. The mode 
IS configurable by the DotOrderMode register. 

The dot order unit maintains a dot count that is decremented each time a new dot is received from the 
LLU. The dot order controller resets the dot counter to the print_head^size[15:0] at the start of a new line 
via the dot_cnt_rst signal. The dot count is compared with the printhead size (printJiead^sizeflS OJ 
divided by 2) to detemiine the mid point (mid^t) and the line finish point (line Jin) when the dot counter 
IS zero. 

The mid point is defined as the half the number of dots in a particular printhead, and is given by the 

print_head_size bus. 
// define the mid point 

if (dot_cnc(15:0] print_head_sizeC15 :!} )then 

mid^pt = 1 
else 

midjjt = 0 

The dot order unit logic maintains the dot data write pointer. Each time a new dot is written to the dot 
buffer the wnte pointer is incremented. The fill level of the dot buffer is determined by comparing the read 
and wnte pomters. The fill level is used to determine when to backpressure the LLU {ready signal) due to 
the dot buffer filling. A suitable threshold value is determined to allow for the fiill LLU pipeline to emntv 
mto the dot buffer. ^ ^ 

The dot order stalling control is given by: 

// determine the ready/avail signal to use. based on mode select 
if (mode^sel == 1) then 

dot_active = llu_phi_avail (0] AND ready 

wr_data = llu_phi_data[0 J 
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else 

dot.ACCive = llujphi_avail(ll AND ready 

%nr.daca s llu_phl.daca(l] 
// update Che counters 
i£ (dot.active 1) then { 

wr.en = 1 

wr.adr +♦ 

i£ (dot.cnt as O) then 
I dot_cnt B print^head^size 

else 

dot.cnt — 

) 

The dot writer needs to determine when to stall the LLU dot data stream. A number of factors could stall 
the dot stream in the LLU such as buffer filling, waiting for the mid point, waiting for the line finish or the 
dot order controller is waiting for the line start condition from the PHI controller. 

The stall logic is given by: 

// determine when to stall the LLU generator 
fill_level =5 wr_adr - rd_adr 

if (fill_level > (32 - THRESHOLD ) ) then // THRESHOLD is open value TBD 
ready =0 // buffer is close to full 

elsif ( gen_en 0) then 

ready =s 0 // stalled by the datapath controller 

else 

ready =1 // everything good no stall 
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32.9.10,2 Data generator 



fiaafiLaaitbLfi&.Bi£ifisl 
Reset ^ 



count « sfcfluirsjDad 



r ^SrcikPre ^ 




Machine remains in same state by default 

All outputs are zero unless othenwise stated 

State Description: 

Reset: Normal reset state 

SrdkPre: Count the SrclkPre number of dock cycles 

DataGen: Read Une Dot data from buffer 

MarginQen: Generate DotMargin number of dots 

SrctkPost: Wait tor SrcIkPost number of cycles 



N 

f rMarginGen j 



count 



coonu srctKj)ost 



ry — * — N 

I (^SrclkPost V 



dol iiwmln^«| 



Figure 250. Data generator state diagram 

SrS^f TT' ^[°^^/^^ fr""* dot buffer and feeds dot data to the printhead at a configured 
T^Zu^ ^""'Aea<f/?a,e). It also generates d,e matgin zero data and aligns the dot L^geneSo 
the synchronization pulse from the PHI controller. generanon to 

The data generator controller waits in Reset state until it receives a line start pulse from the PHI controller 
(data St signal). Once a start pulse is received it proceeds to the SrclkPre state loading fcount« Sft" 

Sr . ""^ '"^^ decrements the counter. No data is read or output at Sr^e ^ht 

the count IS zero the machine proceeds to the DataGen state. ^ 

On transition it loads the counter with the printhead size (print^head size). If maisining is to be used then 
he configured print head^Ue should be adjusted by the dot margik ^.,^^1^%^^^^^^ - 
(physu:al_printjiead_size - (dot_margin * 2)). prmt_neaa_ftze - 

While in DataGen state data is read from the dot buffer and output to the printhead. The counter ^ii a,. 
dtt. »iir c™r ge. ln«sf.md. Ti. ,»cod«od. for Ite n«^«. Me i, »>,.„ by '° °" 

// increment the rate count 
rate_cnt ♦+ 

// determine if data should be read 
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// first detenxiine if data is available in buffer 
If (rd_adr != wr_adr ) then 

if (print_head_rote[rate_cnt) == 1 ) then 

detractive = 1 

gate^srclk = 1 

rdLadr +♦ 

dot.data » rd_data 
count — 
else 

dot.active s 0 
gate.srclk = 0 

else 

dot^active = 0 
gate_srclk « 0 

When the counter reaches zero the state machine will jump to the MarginGen state if the configured mar- 
gin value is non-zero, otherwise it will jump directly to the SrclkPost state. On transition to MarginGen 
state it loads the cycle cotinter with the dot^margin value, and begins to count down. While in the Margin- 
Gen state the data generator logic block writes dot data to the printhead but does not read from the dot 
buffers. It creates zero dot data words for the margin duration. 

When the counter reaches zero the machine jumps to the SrclkPost state, loads the clock counter with the 
SrclkPost value and decrements. When the count is finished the state machine returns to the Reset and 
awaits the next start pulse. Should a line sync arrive before the data generators have completed (data Jin 
signal) the PHI controller will detect a print error and stall the PHI interface. 



32.9.10.3 Data serializer 

The data serializer block converts 6-bit dot data at phiclk rates (nominally 1 06 MHz) to 2-bit data at doclk 
rates (nominally 320 MHz). 



phlclk 



docik 



dot_data[5:0) 



Invalid 



VaHdl5:0) 



X 



Vaiid(5:0} 



X 



Invalid 



i_rm_rLrL_rL_r 



3iZTZxzxiXTXzizijnzTi:izi 

gate.srclk / " ~ ~ \ 

gate_srdk_del 
srdk 



Figure 251. Data serializer timing 

The srclk is only active when data is available for transfer to the printhead, as enabled by the gate_srclk 
signal. The data rate mechanism in the data generator block will mean that data is not transferred to the 
printhead on every phiclk cycle. Both the dotjdata and gate^srclk signals are clocked out by ih^ phiclk and 
can only change on the rising of phiclk. 
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The data serializer block allows easy separation of clock gating and clock to logic structures from the rest 
of the PHI mterface. All registers in the block are clocked at doclk rates. 



phead.swap - 
doLdata[0K5:0] • 

ctot_datal1JI5:0J - 



phldk- 
phLsdrial.order- 



Mux Logic 



pi>ead_swap- 
9ata.srcik[0] - 

gate_$rdk[1] - 



dot^dataf3:21 



dot_datarS:4t ^ 



mux sal 



doclk- 



\1 



qate_sfclK del 



ph.data(l:01 



Jlock 



► srdk 



FtguFo 252. Data serializer RTL Diagram 

-Hie mux logic determines which data bits from the dot^data bus should be selected for output on the 
ph^data to the prmthead. The selection is dependent on the phiclk cdzc. 
if (phiclk == 1) then 

mux^sel = 1 
elsif ( xnux^sel == 2 ) then 

mux^sel = 0 
else 

inux^sel + + 

The dot data serialization order can be configured by PhiSerialOrder register. If the PhiSerialOrder is zero 
^otp'^OJ dot[3:2] then dot[5:4J, If the register is one then the order is dot [5:4], dot[3:2]. 
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33 Test Units 



33,1 JTAG INTERFACE 



S5 



A standaidJTAG Ooint Test Action Group) Interface is included in SoPEC for Bonding and lO testine 
purposes. The JTAG port will provide access to all internal BIST (Built In Self Test) sSS! 



33.2 Scan Test I/O 



33.3 . Analog Test Units 

33.3.1 USB PHY Testing 

Ir^Lo^KS^^ "^"^ ^"^^''^ '° ^^'^ ^^^^ ^^^^^ ^« CI^U 

33.3.2 Embedded PLL Testing 

The embedded clock generator PLL will require test access from JTAG port. 
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34 SoPEC Pinning and Package 

34.1 Overview 

It is intended that the SoPEC package be a 100 pin LQFP. Any spare pins in the package may be used by 
increasing the number of available GPIO pins or adding extra power and ground pin. The pin list shows the 
minimum pin requirement for the SoPEC device. 



Table 166. SoPEC Pin Ust 











Clocks and resets 






xtalin 


1 




TBD 


N/A 


xtalin 


Crystal Input pin 


xtalout 


1 


0 


TBD 


N/A 


xtalout 


Crystal output pin 


reseCn 


1 


1 


LVTTL 


2.5V 


reseun 


Asynchronous active low reset 


Printhead laterface 


ph.data[0][0] 


2 


o 


LVDS 


3.3v 


phi_p h_data_o(0] [0] 


Dot data for colors 0-2 for Printhead 0. 
Using differential signalling 


1 


LVTTL 


3.3V 


phij>h_data_llO] 


Input mode bit used for nozzle test 
result printhead 0 


ph_data[0][1] 


2 


0 


LVDS 


3.3v 


phi_ph_data_o(01[11 


Dot data (or colors 3>S for Printhead 0. 
Using differential signaUtng 


1 


LVTTL 


3.3V 


Phl_ph_dat8_ll1) 


Irtput mode bit used for temperature 
data printhead 0 


ph_data[11[0J 


2 


o 


LVDS 


3.3V 


phl_ph,datB_o(lUO] 


Dot data tor colors 0-2 for Printhead 1 . 
Using differential signalling 


1 


LVTTL 


3.3v 


phi_ph_data_ll11 


Input mode bit used for nozzle test 
result printhead 1 


ph.data(1Kll 


2 


o 


LVDS 


3.3v 


phl_ph_data_o(11p) 


Dot data for colors 3-5 for Printhead 1 . 
Using differential signalling 


i 


LVTTL 


3.3v 


phi_ph_data_t[1) 


Input mode bit used for temperature 

data printhead i 


srcflctO] 


2 


0 


LVDS 


3.3V 


phLsrclk[0] 


Differential dot data shift clock for prim 
fveadO 




2 


o 


LVDS 


3.3v 


phLsrclk(1J 


Differential dot data shift clock for print 
head 1 


readl 


1 


o 


LVTTL 


3.3v 


ptil_readi 


Common Print head mode control 


frdk 


1 


o 


LVTTL 


3.3v 


phLfrclk 


Common Fire pattern shift clock, needs 
to toggle once per fire cycle 


profitd 


1 


0 


LVTTL 


3-3V 


phi_proftle 


Common Pulse profile for all colors 


Isyncf 


1 


o 


LVTTL 


3.3v 


phClsynd.o 


Line Sync output from Master to Staves 


1 


LVTTL 


3.3v 


phLlsyndJ 


Line Sync input to Slaves from Master 


USB Connections 


usbd 


2 


uo 


Differen- 
tial 


3.3v 


Direct Phy Connection 


USB differential data 


JTAQ 


tdo 


1 


o 


CMOS 


2.5v 


tdo 


JTAG Test data out port 


tms 


1 


1 


CMOS 


2.5v 


tms 


JTAG Test mode select 


tdl 


1 




CMOS 


2.5v 


tdJ 


JTAG Test data In port 


tck 


1 


1 


CMOS 


2.5v 


tck 


JTAG Test access port ctock 


General Purpose lO . ' 



Doc: SoPEC_hardware_design 
Version: 2.3 



S3 Proprietary Document 



29 Nov 2002 
Page 5.35 



SoPEC : Hardware Design 



S5 



Table 166. SoPEC Pin List 



gplo(3:0] 



8Pk>{7:4J 



apio{11:8] 



gpio[13:12] 




CMOS 



High 
Driva 
CMOS 



CMOS 



Open col- 
lector 

CMOS 



CMOS 



CMOS 



2.Sv 



2.5v 



2.5v 



2.5v 
2.5v 



2.5V 



2.5v 



gpto,ol3:0) 



gpK).i(3:0] 



9Pio«o(7:41 



epioJf7:41 



gpio.o[i 1:8] 



gpioJ[1l:81 



gpio_o(13:12] 



QpioJI13:12J 



Motor control pins / genaraJ purposa 

Output 



General purposa Ityut 



LH> driver pins / ganaral purposa Out- 
put 



General purpose Input 



L5S Interfaea pins / general purposa 
Output 



LSS interface pins / general purpose 
Input 



ISI Interface pins / general purpose 
Output 



ISI interlace pins / general purpose 
Input 
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35 Memjet Printhead 

This section is quoted verbatim from SoPEC/MoPEC Bilithic Printhead Reference document [10]. 

35.1 Background 

Silverbrook's bilithic Memjet™ printheads are the target printheads for printing systems which will be 
controlled by SoPEC and MoPEC devices. 

This document presents the format and structure of these printheads, and describes the their possible 
airangements in the target systems. It also defines a set of terms used to differentiate between the types of 
printheads and the systems which use them. 

35.2 Companion Documents 

Currently, this document is only concerned with the structure of the printheads and their systems, with 
regard to the way in which dot data is loaded. 

Refer to the Bilithic Printhead Specification [2] for the complete description of the functionality of these 
devices. 

This document relics on certain definitions and details presented in Bilithic Printhead Specification [2]. 

35.3 Definitions 

This document presents terminology and definitions used to describe the bilithic printhead systems. These 
terms and definitions are as follows: 

• Printhgad Tvpg - There are 3 parameters which define the type of printhead used in a system: 
•Direction of the data flow through the printhead (clockwise or anti-clockwise, with the printhead 

shooting ink down onto the page). 
• Location of the left-most dot (upper row or lower row, with respect to ). 

•Printhead footprint (type A or type B, characterized by the data pin being on the left or the right of 
where is at the top of the printhead). 

• Printh?ftd Arranggmgnt - Even though there are 8 printhead types, each arrangement has to use a spe- 

cific pairing of printheads, as discussed in Section 35.4. This gives 4 pairs of printheads. However, 
because the p^er can fiow in either direction with respect to the printheads, there are a total of eight 
possible arrangements, e.g. Arrangement 1 has a Type 0 printhead on the left with respect to the 
paper flow, and a Type 1 printhead on the right Arrangement 2 uses the same printhead pair as 
Arrangement 1 , but the paper flows in the opposite direction. 

• Color Q is always the first color plane encountered by the paper. 

• PotQ is defined as the nozzle which can print a dot in the left-most side of the page. 

• The Rvftn Plan^ of a color corresponds to the row of nozzles that prints dot 0. 

Note that throughout this document, where the various printheads and systems are presented, the print- 
heads always shoot ink down onto the page. 

Figure 253 shows the 8 different possible prijithead types. Type 0 is identical to the Right Printhead pre- 
sented in Figure 3 in [2], and Type 1 is the same as the Left Printhead as defined in [2]. 
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While the printheads shown in Figure 253 look to be of equal width (having the same number of nozzles) it 
is important to remember that in a typical system, a pair of unequal sized printheads may be used, 
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Color n 
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o o o 



Color n 



Type 0 printhead 



Type 1 printhead 



o oo 
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Color n 
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Color n 



o oo 



Type 2 printhead 



Type 3 printhead 
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-©-©■ 



Color n 
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o o»o 



Color n 



■e-Q- 



-©-©- 



Type 4 printhead 



Type 5 printhead 

V4. 



000 



Color n 



00 O 




Color n 



o o > e 



o o cw 



Type 6 printhead Type 7 printhead ♦ 

Figure 253. Printhead Types 0 to 7 

Table 167 defines the printhead pairing and location of the each printhead type, with respect to the flow of 
paper, for the 8 possible arrangements 

Table 167. Definition of the different printhead arrangements 



^^^^^^^^^^^^^^^^ 


^f>rinthea^nileft^^^l 

^y|tKfreSpect^t^|t^^ 

^^^^^^^^^^^^^^ 


Ip^grinthead ;pn^Hght;Side;^J 

^^^^^^^^^^^^^^^^ 


Arrangement 1 


lypeO 


Type 1 


AiTBngement 2 


Type 1 


Type 0 


Arrangement 3 


Type 2 


Type 3 


Arrangement 4 


Type 3 


Type 2 


Arrangement 5 


Type 4 


Type 5 


Arrangement 6 


Type 5 


Type 4 


Arrangement 7 


Type 6 


Type 7 


Arrangement 8 


Type 7 


Type 6 
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35.4 BfUTHIC PRINTHEAD SYSTEMS 



When using the bilithic printheads. the position of the power/gnd bars coupled with the physical footprint 
of the printheads mean that we must use a specific pairing of printheads together for printing on the same 
side of an A4 (or wider) page, e.g. we must always use a Type 0 printhead with a Type 1 printhead etc. 

While a given printing system can use any one of the eight possible arrangements of printheads, this docu- 
ment only presents two of them, Arrangement 1 and Arrangement 2, for purposes of illustration. These 
two arrangements are discussed in subsequent sections of this document. However, the other 6 possibilities 
also need to be considered. 

The main difference between the two printhead aixangements discussed in this document is the direction 
of the paper flow. Because of this, the dot data has to be loaded differently in Arrangement I compared to 
Arrangement 2, in order to render the page correctly. 



Figure 254 shows an Arrangement 1 printing setup, where the bilithic printheads are arranged as follows: 

• The Type 0 printhead is on the left with respect to the direction of the paper flow. 

• The Type 1 printhead is on the right. 



35.4.1 Example 1 : Printhead Arrangement 1 
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Type 0 Printhead 



v+ 



Type 1 Printhead 




Gnd 
i i 



The printheads are facing dowawaids. 

The ink is being shot down onto the page. 

^ of Paper Flow 



Figure 254. Identificatton of printheads nozzles and shift*register sequences for printheads in 

Arrangement 1 

Table 168 lists the order in which the dot data needs to be loaded into the above printhead system, to 
ensure color 0-dot 0 appears on the left side of the printed page. 



Table 168, Order in which the even and odd dots are loaded for printhead Arrangement 1 



^vDpt^ense^^ 






Odd 


Loaded second in 
descending order. 


Loaded first in 
descending order. 


Even • 


Loaded first in 
ascending order 


Loaded second in 
ascending order. 
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Figure 255 shows how the dot data is demultiplexed within the printheads. 



Data[l]. 



Data[0]- 






Type 0 Printhead Type 1 Printhead 










1 


frfc%.Kt:t= 




1 






1 






1 






1 


^■^^ ft«iLh«= 

. , . Cakr« t\L CoI^« 











-DatafO] 



-Data[l] 



Figure 255. Demultiplexing of data within the printheads In Arrangement 1 

Figure 256 and Figure 257 show the way in which the dot data needs to be loaded into the printheads in 
Arrangement 1, to ensure that color 0-dot 0 appears on the left side of the printed page. 

Figure 256. Signalling for a Type 0 printhead in Arrangement 1 

Data[l] ^^^^^ ^ ^ 



Figure 257. Signalling for a Type 1 printhead in Arrangement 1 



35.4.2 Example 2: Printhead Arrangement 2 

Figure 258 shows an Arrangement 2 printing seUip, where the bilithic printheads are arranged as follows- 

• The Type 1 pnnthead is on the left with respect to the direction of the paper flow. 

• The Type 0 printhead is on the right. 
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The piintheads are facing downwards. 
The ink is being shot down onto the page. 
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Figure 258. Identification of printheads nozzles and shift-register sequences for printheads In 

Arrangement 2 

Table 169 lists the order in which the dot data needs to be loaded into the above printhead system, to 
ensure color 0-dot 0 appears on the left side of the printed page. 



Table 169. Order In which the even and odd dots are loaded for printhead Arrangement 2 









Odd 


Loaded first in 
descencfing order. 


Loaded second in 
descending order. 


Even 


Loaded second In 
sscending order. 


Loaded first in 
ascending order. 
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Figure 259 shows how the dot data is demultiplexed within the printheads. 

ii 



Type 0 Printhead Type 1 Printhead 



DatallJ- 



Data[0]- 
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9;^nu»H; 






s«4nu;; 






Crtwl C*to-9 ^ 





Demujr 
Xogic 



-Data[0] 



-Data[l] 



Demux^ 
Logic 



Figure 259. Demultiplexmg of data within the printheads in Arrangement 2 

Figure 260 and Figure 261 show the way in which the dot data needs to be loaded into the printheads in 
Arrangement 2, to ensure that color 0-dot 0 appears on the left side of the printed page. 

Data[l] — — 

SrClk 

Figure 260. Signalling for a Type 0 printhead in Arrangement 2 



SrCIk njTJOJlJ-UlJlJl^^ 

Figure 261. Signalling for a Type 1 printhead in Arrangement 2 



35.4.3 Conclusions 



Comparing the signalling diagrams for Arrangement 1 with those shown for Arrangement 2, it can be seen 
that the color/dot sequence output for a printhead type in Arrangement 1 is the reverse of the sequence for 
same printhead in Arrangement 2 in temis of the order in which the color plane data is output, as well as 
whether even or odd data is output first. However, the order within a color plane remains the same, i.e. odd 
descending, even ascending. 

From Figure 262 and Table 1 70, it can be seen that the plane which has to be loaded first (i.e. even or odd) 
depends on the arrangement. Also, the order in which the dots have to be loaded (e.g. even ascending or 
descending etc.) is dependent on the arrangement. 
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If the device controlling the printheads can re-order the bits according to the following criteria^ then it 
should be able to operate in all the possible printhead arrangements: 

• Be able to output the even or odd plane first. 

• Be able to output even and odd planes in either ascending or descending order, independently. 

• Be able to reverse the sequence in which the color planes of a single dot are output to the printhead. 
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Figure 262. All 8 Printhead Arrangements 



Table 170. Order in which even and odd dots and planes are loaded into the various printhead 
arrangements 









Arrangement 1 


Even ascending loaded first 
Odd descending loaded second 


Odd descending loaded ftrst 
Even ascending loaded second 


Arrangement 2 


Odd descending loaded first 
Even ascending loaded second 


Even ascending loaded first 
Odd descending loaded second 
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1 _^feMM?M^priinc»^^^^ 




Airangement 3 


Odd ascending loaded first 
Even descending loaded second 


Even descending loaded first 
Odd ascending loaded second 


Arrangement 4 


Even descending loaded first 
Odd ascending loaded second 


Odd ascending loaded first 
Even descending loaded second 


Arrangement 5 


Odd ascending loaded first 
Even descending loaded second 


tven oescending loaded first 
Odd ascending loaded second 


Arrangement 6 


Even descending toaded first 
Odd ascending loaded second 


*Joo ascending loaded first 
Even descending loaded second 


Anangement 7 


Even ascending loaded first 
Odd descending loaded second 


Odd descending loaded first 
Even ascending loaded second 


Arrangement 8 


Odd descending loaded first j 
Even ascending loaded second ( 


Even ascending loaded first 
Odd descending loaded second 
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